skip to content
 

See the generic SLURM documentation for how to use SLURM. This page describes the setup on the Nest cluster.

Nest has sixteen nodes which were funded by the Theory RIG and four nodes which were funded by the Wales group. Hence there is a restricted-access partition which contains the Wales-funded nodes.

Nest has five 'partitions':

Partition Name Who can use it Nodes Default time limit Maximum time limit Other settings Priority for preemption
TEST Anyone node-0-0 30 minutes 30 minutes None 100
MAIN Anyone node-0-1 to node-0-15 48 hours 48 hours This is the default partition 50
WALES Members of nest-wales-users group node-0-16 to node-0-19 7 days 28 days None 50
LONG Anyone node-0-1 to node-0-15 7 days 28 days

Limited to max 200 cores at a time

over all jobs in this partition

50
CLUSTER Anyone All the nodes 7 days 28 days None 0

When submitting a compute job you can give a list of possible partitions with the -p flag. SLURM will try to place the job in a suitable partition where it will start as soon as possible. If you don't choose a partition at all, SLURM will use MAIN. This is safe from pre-emption but does not have access to every node so may take longer to start. If you want it to start as soon as possible and don't mind risking it being pre-empted, use MAIN,CLUSTER. That way the job will run in MAIN if there are enough free nodes there, but if there are not enough nodes in MAIN but there are enough over the whole machine, it will run in CLUSTER. However if someone else then submits a job to MAIN, TEST, LONG, or WALES that can be run by cancelling the CLUSTER job, the CLUSTER job gets cancelled. If you want a cancelled job to be put back on the queue to be restarted later, use the --requeue flag.

System status 

System monitoring page

Can't find what you're looking for?

Then you might find our A-Z site index useful. Or, you can search the site using the box at the top of the page, or by clicking here.