SLURM usage

Batch jobs

These are run by writing a script and submitting it to the queue with the sbatch command like this:

sbatch myscript

Scripts for batch jobs must start with the interpreter to be used to excute them (different from PBS/Torque). You can give arguments to sbatch as comments in the script. Example:

#!/bin/bash
# Name of the job
#SBATCH -J testjob
# Partition to use - this example is commented out
##SBATCH -p NONIB
# Time limit. Often not needed as there are defaults, 
# but you might have to specify it to get the maximum allowed.
# time: 10 hours
##SBATCH --time=10:0:0
# Pick nodes with feature 'foo'. Different clusters have 
# different features available
# but most of the time you don't need this
##SBATCH -C foo
# Restrict the job to run on the node(s) named
##SBATCH -w compute-0-13
# Number of processes
#SBATCH -n1
# Start the program
srun myprogram

A more complicated example which uses three tasks, note the use of --exact to make sure each job step only asks for the resources specifically allocated to it:

#!/bin/bash
#SBATCH -n3 # 3 tasks
echo Starting job $SLURM_JOB_ID
echo SLURM assigned me these nodes
srun -l hostname
srun --exact -n2 program1 & # start 2 copies of program 1
srun --exact -n1 program2 & # start 1 copy of program 2
wait # wait for all to finish

You should find a detailed example script in /info/slurm on the cluster you are using.

These can be run in two ways, via salloc and srun. If you just want a single interactive session on a compute node then using srun to allocate resources for a single task and launch a shell as that one task is probably the way to go. But if you want to run things in parallel or more than one task at once in your interactive job, use salloc to allocate resources and then srun or mpirun to start the tasks, as starting multiple copies of an interactive shell at once probably isn't what you want.

# One interactive task. Quit the shell to finish
srun --pty -u bash -i
# One task with one GPU assigned (GPU clusters only, obviously)
srun --pty --gpus=1 -u bash -i 
# One task with one maxwell GPU
srun --pty --gpus=1 -C maxwell -u bash -i 
# one task with one Titan Black GPU 
srun --pty --gpus=1 -C titanblack -u bash -i 
# One task with one GPU on the node called 'happy'
srun --pty --gpus=1 -w happy -u bash -i 
# Allocate three tasks, followed by running three instances of 'myprog' within the allocation.
# Then start one copy of longprog and two copies of myprog2, then release the allocation
salloc -n3
srun myprog
srun -n1 --exact longprog &
srun -n2 --exact myprog2 
exit

If you are using interactive srun to experiment with setting up jobs you should add -W 0 to the command line. Otherwise SLURM helpfully kills your interactive job when something fails or you pause too long between launching tasks.

Useful commands

squeue # view the queue

scancel <jobid> # cancel a job

sinfo # See the state of the system

sacct -l -j <jobid> # List accounting info about a job

Asking for resources

salloc/srun/sbatch support a huge array of options which let you ask for nodes, cpus, tasks, sockets, threads, memory etc. If you combine them SLURM will try to work out a sensible allocation, so for example if you ask for 13 tasks and 5 nodes SLURM will cope. Here are the ones that are most likely to be useful:

Option	Meaning
-n	Number of tasks (roughly, processes)
-N	Number of nodes to assign. If you're using this, you might also be interested in --tasks-per-node
--tasks-per-node	Maximum tasks to assign per node if using -N
--cpus-per-task	Assign tasks containing more than one CPU. Useful for jobs with shared memory parallelization
-C	Features the nodes assigned must have
-w	Names of nodes that must be included - for selecting a particular node or nodes
--gpus=X	Ask for X GPUs in the job. It's expected to use all X GPUs in a single job step.
--gpus=per-task=1 -nX	Ask for X tasks, each with one GPU. This one is useful for jobs which want to assign several GPUs and then launch individual job steps on those GPUs. The steps can be launched with srun -n1 -N1 --exact .

MPI jobs

Inside a batch script you should just be able to call mpirun, which will communicate with SLURM and launch the job over the appropriate set of nodes for you:

#!/bin/bash
# 13 tasks over 5 nodes
#SBATCH -n13 -N5
echo Hosts are
srun -l hostname
mpirun /home/cen1001/src/mpi_hello

To run MPI jobs interactively you can assign some nodes using salloc, and then call mpirun from inside that allocation. Unlike PBS/Torque, the shell you launch with salloc runs on the same machine you ran salloc on, not on the first node of the allocation. But mpirun will do the right thing.

salloc -n12 bash
mpirun /home/cen1001/src/mpi_hello

You can even use srun to launch MPI jobs interactively without mpirun's intervention. The --mpi option here is to tell srun which method the MPI library uses for launching tasks. This is the correct one for use with our OpenMPI installations. However using srun like this isn't normally necessary unless you want to bind MPI processes to particular CPUs or sockets. Although mpirun has options for binding, they can interact badly with SLURM.

srun --mpi=pmi2 -n13 ./mpi_hello

Non-MPI Parallel jobs

In a parallel job which doesn't use MPI you can find out which hosts you have and how many by running "srun -l hostname" inside the job script. The -l option will print the slurm task number next to the assigned hostname for the task, skip it if you want just the list of hostnames.

You can then use "srun --exclusive" inside the job to start individual tasks, spreading them out over the available resources.

File copying in parallel jobs

If you want to run commands to set up files on every node in your job then this idiom is helpful

srun -n $SLURM_NNODES --tasks-per-node=1 mkdir /scratch/${SLURM_JOB_USER}/${SLURM_JOB_ID}
srun -n $SLURM_NNODES --tasks-per-node=1 cp file1 file2 /scratch/${SLURM_JOB_USER}/${SLURM_JOB_ID}

There is also an sbcast command that copies files to every node in the job, but that's all it does. It can't set up directories or delete files.

Jobs with multiple GPUs on GPU clusters

To get GPUs you can use --gpus=X, but in that case it's expected you'll use all the GPUs in a single job step (ie you launch one program that uses them all at once). To assign several GPUs and then allocate these to different tasks use --gpus-per-task and then srun to launch tasks within the job:

#!/bin/bash
#SBATCH -J testmultigpu
#SBATCH --gpus-per-task=1
#SBATCH -n4
# show us what resources we have: these run over everything
srun -l hostname
srun -l echo $CUDA_VISIBLE_DEVICES

# assign one instance of show.sh to each GPU over all four GPUs.
# Have to set -N here or it will default to the number of nodes in the whole job and print a warning
srun --exact -l -n1 -N1 /home/cen1001/show.sh &
srun --exact -l -n1 -N1 /home/cen1001/show.sh &
srun --exact -l -n1 -N1 /home/cen1001/show.sh &
srun --exact -l -n1 -N1 /home/cen1001/show.sh &
wait

Batch jobs

Interactive jobs

Useful commands

Asking for resources

MPI jobs

Non-MPI Parallel jobs

File copying in parallel jobs

Jobs with multiple GPUs on GPU clusters

System status

Can't find what you're looking for?

Quick Links

About the Department

Departmental Services

Contact IT Support at the Department of Chemistry, University of Cambridge

Study at Cambridge

About the University

Research at Cambridge