skip to content
 

This covers things specific to the pat cluster. For general SLURM use see SLURM usage.

Partitions

pat has the following partitions:

Name Nodes Time limit Notes
GPU All the nodes with GPUs 30 days Default partition
CPU All the nodes with only CPUs 30 days  
DEBUG All the nodes None Pre-emptor, restricted access

 

Types of compute node

 

The cluster's nodes are not identical, unlike many local cluster systems. There are a range of GPUs available, and also sometimes more than one OS version. Different OS versions have different software available as not all compilers/CUDA versions are supported on every OS. You select the features you want with SLURM constraints. There are also cpu-only nodes.

Currently available features:

Name Description
teslak20 Nvidia Tesla K20m GPUs
titanblack Nvidia GeForce 700 Titan Black GPUs
3gpu Node has 3 gpus
4gpu Node has 4 gpus
cpu Node has dual, 16-core CPUs

To see what combination of features each node has run 'scontrol show nodes' on pat.

To select particular features use the '-C' or '--constraint' option to srun or sbatch. You can combine multiple features with & for a boolean AND, or | for a boolean OR.

Using the DEBUG partition

The intention of this partition is to allow people to allocate a GPU for debugging for an indefinite time. It's not for running production work. Only people nominated by group computer reps can have access to this partition. To allocate a GPU do something like this

salloc -n1 --gres=gpu:1 -p DEBUG --no-shell

using whatever parameters you need to get the GPU you want. salloc understands all the same ones as sbatch and srun . SLURM will bump running jobs off the GPUs if it needs to in order to satisfy the allocation request. The salloc command will return a job id. You'll be able to see this job in the queue, running with unlimited walltime. 

Then to access the allocated GPU do something like

srun --jobid=id mycommand

where 'id' is the job id that the salloc command gave you. To get rid of the allocation and allow others to use the GPU, cancel it with

scancel id

System status 

System monitoring page

Can't find what you're looking for?

Then you might find our A-Z site index useful. Or, you can search the site using the box at the top of the page, or by clicking here.