skip to content
 

Getting information

squeue # view the queue
sprio # see what priority each queued job has
sshare -a # see the fairshare numbers
sinfo # see the system state

Running a job

SLURM is slightly different to Torque in that it has both jobs and job steps. A 'job' is an allocation of resources to an account for a time. A 'job step' is a task that runs inside the allocation. You can launch multiple job steps within a job if you want, although probably most of the time you'll just want one. 

As the pat cluster is a GPU cluster it makes sense to schedule by GPUs rather than CPUs so the examples concentrate on GPUs.

Interactive jobs

All of these will run for the maximum allowed time, three days. Within the job the variable CUDA_VISIBLE_DEVICES will be set to the appropriate value for the assigned GPU. 

srun --pty --gres=gpu:1 -u bash -i # One single GPU
srun --pty --gres=gpu:1 -C maxwell -u bash -i # One Maxwell GPU
srun --pty --gres=gpu:1 -C titanblack -u bash -i # One Titan Black GPU
srun --pty --gres=gpu:1  -w compute-titanblack-0-6 -u bash -i # One GPU on the node called compute-titanblack-0-6
srun --pty --gres=gpu:1 -C happy -u bash -i # Any one GPU on the node that used to be called 'happy' - old node names are now set as 'features'

To change the GPU mode issue the appropriate nvidia-smi command within the job:

sudo nvidia-smi -c 3 -i $CUDA_VISIBLE_DEVICES # set to exclusive
sudo nvidia-smi -c 0 -i $CUDA_VISIBLE_DEVICES # set to default

Cancelling

scancel <jobid>

Batch jobs

Run a batch job with sbatch <scriptname>.

Example batch script

#!/bin/bash
#SBATCH --job-name=cudamemtest
#SBATCH --gres=gpu:1
#SBATCH --constraint=maxwell
#SBATCH --mail-type=ALL
hostname
source /etc/profile.d/modules.sh
module add cuda/6.5

sudo nvidia-smi -i $CUDA_VISIBLE_DEVICES -c 3
/home/cen1001/cuda_memtest-1.2.3/cuda_memtest --stress --num_passes 1 --num_iterations 100 

Output will appear in slum-<jobid>.out as each job step finishes. You can change that with sbatch options. 

Because a batch job can launch multiple job steps, each taking a part of the job's allocation, you can use the srun command within the batch job to tell SLURM to allocate particular resources to each job step, which in the case of GPUs means setting CUDA_VISIBLE_DEVICES. Here is an example which assigns two GPUs and runs a different job on each.

#!/bin/bash
#SBATCH --job-name=cudamemtest
#SBATCH --gres=gpu:2
#SBATCH --constraint=maxwell
source /etc/profile.d/modules.sh
module add cuda/6.5

srun --gres=gpu:1 /home/cen1001/cuda_memtest-1.2.3/cuda_memtest --disable_all --enable_test 1 --num_passes 100 --num_iterations 100 &
srun --gres=gpu:1 /home/cen1001/cuda_memtest-1.2.3/cuda_memtest --disable_all --enable_test 2 --num_passes 100 --num_iterations 100 &
wait

Attaching to a batch job while it runs

sattach <job>.<step> will let you peek at the output from a running job step.

Queues and constraints

Unlike the local Torque systems, there are not lots of different queues providing shorthand for requesting various combinations of CPUs and job time. SLURM's 'partitions', which are the equivalent to Torque's 'queues' do not support setting the default task geometry in the same way so there's no point having lots of partitions. On the upside, the task geometry options available for jobs are far more flexible and powerful than Torque's - look at the srun and sbatch manpages if you want to know more. If you don't care, just ask for GPUs as in the examples above and let the CPU allocation take care of itself. 

You can set the time for a job with the -t/--time flag to srun/sbatch (minutes, the maximum is three days and if you don't give a time that's what you'll get).

Having said that, the cluster does have two partitions: 'MAIN' and 'DEBUG'. 'DEBUG' is a special-purpose partition only available to people nominated by the Wales group computer reps. It allows unlimited time and jobs in this partition may pre-empt jobs in the 'MAIN' partition. It is there purely for debugging. By default all jobs use MAIN. There are also sometimes other partitions present for special purposes.

Types of compute node

The cluster's nodes are not identical, unlike many local cluster systems. There are a range of GPUs available, and also sometimes more than one OS version. Different OS versions have different software available as not all compilers/CUDA versions are supported on every OS. You select the features you want with SLURM constraints. 

Currently available features:

Name Description
teslak20 Nvidia Tesla K20m GPUs
titanblack Nvidia GeForce 700 Titan Black GPUs
3gpu Node has 3 gpus
4gpu Node has 4 gpus

happy,hitaki,joe,ronn,

mongrol,morrigun,blackblood,

rojaws,hammerstein,quartz

The names the nodes had before they were properly clustered

To see what combination of features each node has run 'scontrol show nodes' on pat.

To select particular features use the '-C' or '--constraint' option to srun or sbatch. You can combine multiple features with & for a boolean AND, or | for a boolean OR.

Memory limits

SLURM on pat is configured to set a default memory limit of 3.5GB per core, but jobs can set that higher by using the --mem-per-cpu setting. However no matter how high we set the limits in SLURM, Linux itself still imposes some virtual memory limits in extreme conditions despite our having turned off all the tunable ones we could find.

SLURM Power saving

The nodes in the ABC cluster sometimes have power saving enabled. When power saving is on SLURM will shut them down after ten minutes of inactivity and then boot them up automatically when they are assigned to a compute job. Booting a GPU node from cold takes about two and a half minutes, so there will be a wait when starting a job if the cluster has been idle for a while.

If a node is power saving then 'sinfo nodes' will show its state as idle~ , and 'scontrol show nodes' will show it as IDLE+POWER .

Using the DEBUG partition

The intention of this partition is to allow people to allocate a GPU for debugging for an indefinite time. It's not for running production work. Only people nominated by group computer reps can have access to this partition. To allocate a GPU do something like this

salloc -n1 --gres=gpu:1 -p DEBUG --no-shell

using whatever parameters you need to get the GPU you want. salloc understands all the same ones as sbatch and srun . SLURM will bump running jobs off the GPUs if it needs to in order to satisfy the allocation request. The salloc command will return a job id. You'll be able to see this job in the queue, running with unlimited walltime. 

Then to access the allocated GPU do something like

srun --jobid=id mycommand

where 'id' is the job id that the salloc command gave you. To get rid of the allocation and allow others to use the GPU, cancel it with

scancel id

System status 

System monitoring page

Can't find what you're looking for?

Then you might find our A-Z site index useful. Or, you can search the site using the box at the top of the page, or by clicking here.