Slurm Workload Manager¶
Nodes and partitions¶
Before submitting any job to Pollux, you must learn about available resources. To do that, we use sinfo to view information about Compute nodes and partitions. Once it is run, the command will print the information like the output below.
[user@pollux]$ sinfo
HOSTNAMES PARTITION AVAIL CPUS(A/I/O/T) CPU_LOAD ALLOCMEM FREE_MEM GRES STATE TIMELIMIT
pollux1 chalawan_gpu up 0/24/0/24 3.68 0 54028 gpu:4 idle infinite
pollux2 chalawan_gpu up 0/28/0/28 3.71 0 246330 gpu:4 idle infinite
pollux3 chalawan_gpu up 0/28/0/28 3.60 0 246343 gpu:4 idle infinite
castor1 chalawan_cpu* up 0/16/0/16 0.01 0 55444 (null) idle infinite
castor2 chalawan_cpu* up 0/16/0/16 0.01 0 55434 (null) idle infinite
castor3 chalawan_cpu* up 0/16/0/16 0.01 0 55455 (null) idle infinite
castor4 chalawan_cpu* up 0/16/0/28 0.01 0 92160 (null) idle infinite
castor5 chalawan_cpu* up 0/16/0/28 0.01 0 92160 (null) idle infinite
castor6 chalawan_cpu* up 0/16/0/28 0.01 0 92160 (null) idle infinite
castor7 chalawan_cpu* up 0/16/0/28 0.01 0 92160 (null) idle infinite
castor8 chalawan_cpu* up 0/16/0/28 0.01 0 92160 (null) idle infinite
castor9 chalawan_cpu* up 0/16/0/28 0.01 0 92160 (null) idle infinite
castor10 chalawan_cpu* up 0/16/0/28 0.01 0 92160 (null) idle infinite
castor11 chalawan_cpu* up 0/16/0/28 0.01 0 92160 (null) idle infinite
castor12 chalawan_cpu* up 0/16/0/28 0.01 0 92160 (null) idle infinite
castor13 chalawan_cpu* up 0/16/0/28 0.01 0 92160 (null) idle infinite
castor14 chalawan_cpu* up 0/16/0/28 0.01 0 92160 (null) idle infinite
castor15 chalawan_cpu* up 0/16/0/28 0.01 0 92160 (null) idle infinite
castor16 chalawan_cpu* up 0/16/0/28 0.01 0 92160 (null) idle infinite
castor17 chalawan_cpu* up 0/16/0/28 0.01 0 92160 (null) idle infinite
castor18 chalawan_cpu* up 0/16/0/28 0.01 0 92160 (null) idle infinite
Here we introduce the new field, PARTITION. Partition is like a specific group of Compute nodes. Note that the suffix “*” identifies the default partition. AVAIL shows a partition’s state: up or down while CPUS(A/I/O/T) shows count of nodes with this particular configuration by node state in the form “available/idle/other/total”.
Basic job submission¶
Slurm environment variables¶
Upon startup, sbatch will read and handle the options set in the environment variables. Note that environment variables will override any options set in a batch script, and command line options will override any environment variables. The full details are on sbatch manual (man sbatch
), section “INPUT ENVIRONMENT VARIABLES”. For example, we have a script name task1.
#!/bin/bash
#SBATCH -J task1 # Job name
#SBATCH -t 00:01:00 # Run time (hh:mm:ss)
echo "Hello World!"
The default partition is chalawan_cpu
, but we want to submit a job to chalawan_gpu
instead, we can do either
[user@pollux]$ sbatch -p chalawan_gpu ./task1.slurm
or¶
[user@pollux]$ export SBATCH_PARTITION=”chalawan_gpu” [user@pollux]$ sbatch ./task1.slurm
There are also output environment variables of the batch script which are set by the Slurm controller, e.g., SLURM_JOB_ID, SLURM_CPUS_ON_NODE
. For the full details, see “OUTPUT ENVIRONMENT VARIABLES” on sbatch manual (man sbatch). You may combine them with your script for convenience. The example below shows the results when we print out some of these values.
[user@pollux]$ cat ./echo.slurm
#!/bin/bash
#SBATCH -J echo # Job name
#SBATCH -o %x-%j.out # Name of stdout output file
echo "Job name: $SLURM_JOB_NAME"
echo "Job ID: $SLURM_JOB_ID"
[user@pollux]$ sbatch ./echo.slurm
[user@pollux]$ cat ./echo-130.slurm
Job name: echo
Job ID: 130
We use the command sbatch
followed by a batch script to submit a job to Slurm. sbatch then exits immediately after the script is successfully transferred to the Slurm controller assigned a Slurm job ID. The batch script is not necessarily granted resources immediately, it may sit in the queue of pending jobs for some time before its required resources become available.
[user@pollux]$ sbatch [OPTIONS...] executable [args...]
The batch may contain options preceded with #SBATCH
before any executable commands in the script. For example, we create a simple batch script to print a string “Hello World!” called task1.slurm
. Inside the file looks like this
#!/bin/bash
#SBATCH -J task1 # Job name
#SBATCH -t 00:01:00 # Run time (hh:mm:ss)
echo "Hello World!"
After submission with the command sbatch task1.slurm
, if there is an empty slot, your task will run and exit instantly. You will find the output file, slurm-%j.out
at the current working directory where %j
is replaced with the job allocation number. The words “Hello World!” is appeared inside that output file. By default, both standard output and standard error are directed to the same file.
[user@pollux]$ sbatch ./task1.slurm
Submitted batch job 128
[user@pollux]$ cat ./slurm-128.out
Hello World!
Batch vs Interactive jobs¶
We use the command sbatch
followed by a batch script to submit a job to Slurm. sbatch then exits immediately after the script is successfully transferred to the Slurm controller assigned a Slurm job ID. The batch script is not necessarily granted resources immediately, it may sit in the queue of pending jobs for some time before its required resources become available.
[user@pollux]$ sbatch [OPTIONS...] executable [args...]
The batch may contain options preceded with #SBATCH
before any executable commands in the script. For example, we create a simple batch script to print a string “Hello World!” called task1.slurm
. Inside the file looks like this
#!/bin/bash
#SBATCH -J task1 # Job name
#SBATCH -t 00:01:00 # Run time (hh:mm:ss)
echo "Hello World!"
After submission with the command sbatch task1.slurm
, if there is an empty slot, your task will run and exit instantly. You will find the output file, slurm-%j.``out at the current working directory where ``%j
is replaced with the job allocation number. The words “Hello World!” is appeared inside that output file. By default, both standard output and standard error are directed to the same file.
[user@pollux]$ sbatch ./task1.slurm
Submitted batch job 128
[user@pollux]$ cat ./slurm-128.out
Hello World!
Frequently used sbatch options¶
There are many options you can add to a script file. The frequently used options are listed below. Each option must be preceded with #SBATCH
. For other available options, you can learn from the Slurm website or using the command sbatch -h
or man sbatch
.
Option |
Description |
---|---|
|
name of job |
|
number of nodes on which to run (N = min[-max]) |
|
number of tasks to run |
|
number of cpus required per task |
|
file for batch script’s standard error |
|
file for batch script’s standard output |
|
partition requested |
|
time limit |
|
minimum amount of real memory |
|
required generic resources |