skip to content
 

The queueing system on most of the Theory sector clusters is Torque with the Maui scheduler.

You need three main commands to use the system: showq, qsub, and qstat.

showq

This displays the main job queue. It shows three types of job: those running, those waiting to run, and those blocked for various reasons. It orders the waiting jobs by priority. This is the command to use when you want to know where your job is in the queue, or to check if it has been blocked.

qstat

qstat is another tool for displaying information about jobs and job classes.

'qstat -q' shows you the available job classes (which are often called queues). Note that 'qstat -q' doesn't give you all of the interesting information about any given class. 'qstat -Q' gives a different display for each class. If you want to know everything then 'qstat -Qf classname' will give you all of the user-readable information about the class called classname.

'qstat -f jobid' gives a full listing of the details for the job whose number is jobid. This is useful for debugging; when a job won't run or is running in the wrong place. If the job isn't running then there will usually be a comment in the -f output explaining why. Another useful command for this is 'checkjob jobid'.

'qstat -n jobid' tells you which node(s) the job with number jobid is running on. This is useful to know if you are writing to non-shared filesystems. However you can make the job write this information to your output file too; see later.

'qstat' with no options shows you the Torque server's view of the current state of the job queue. This contains less useful information than the 'showq' command (which is the Maui scheduler's view). It can be misleading because Torque does not know which jobs Maui has blocked or how it has prioritized the rest.

qsub

qsub submits a job. In its simplest form you would run 'qsub myscript', where myscript is a shell script that runs your job. It must be a script and not a binary. Your shell script will be interpreted by your login shell- any #! line at the top is ignored. You can force Torque to use a different shell with the -S qsub option.

qsub has lots of commandline options, but you can make things easier by setting most of the available options within your job script. Here's a very short example job script to look at:

# Set some Torque options: class name and max time for the job. Torque developed from a program called 
# OpenPBS, hence all the PBS references in this file
#PBS -q serial
#PBS -l walltime=2:00:00

# Change to directory you submitted the job from
cd $PBS_O_WORKDIR

# Run program
/home/cen1001/myprog input.file 

All of the clusters have detailed example submit scripts in their /info/pbs directories. You should take an example script from the cluster you want to use and edit it instead of using this one; some things vary from machine to machine.

For those who don't want to write scripts, you can do an interactive qsub with the -I switch.

    qsub -I 
    

This opens a session on the node assigned to the job. It looks like a remote login session, except that the initialisation isn't exactly the same. You can then run commands interactively on your assigned node until your walltime runs out. If you submit to a class which assigns you multiple nodes you can examine the contents of the job nodefile ($PBS_NODEFILE) to see which other nodes you may use.

It is generally better to script jobs than to run interactively, as they can be restarted automatically if the node fails, and it doesn't matter if there isn't a node free exactly when you need one.

Other useful commands

qdel, qhold, and qrls will delete, hold, and release a job respectively. These all have manpages. qalter will let you change certain parameters of a queued or running job. To find other Torque commands, do 'man -k pbs'. Other useful Maui commands are 'showbf', 'showres', and 'mdiag'. Maui does not come with man pages, but the commands are explained at http://docs.adaptivecomputing.com/maui/a.gcommandoverview.php.

Parallel jobs

There are example submit scripts in the /info/pbs directory on most machines for the different parallel libraries they support. The only generalization I can make here is that starting your job with mpirun -n X inside your job script probably won't do what you want! Please read the documentation for the appropriate machine, which should cover it.

Scheduling policy

The scheduling policy on the machines is roughly FIFO but with some extra Maui rules to make it fairer. Three features are used: the ability to make reservations for queued jobs, the throttling rules, and the flexible priority system.

The queue of jobs is first sorted on priority. The priority is made up of several weighted components: time on the queue (subject to throttling- see below), fairshare value, and job expansion factor. The scheduler then starts at the top of the queue and starts jobs until it reaches one that cannot run yet because there are not enough free nodes. The top one or two remaining queued jobs then have reservations made for them; this means that the scheduler works out the earliest those jobs can definitely start and will not schedule anything that could possibly delay them. Finally the scheduler looks further down the queue and tries to backfill lower priority jobs around the reservations.

Throttling solves the problem of a large batch of jobs from one user crowding everyone else out. It limits the number of jobs that any user may have being queued at any time to four. Excess jobs are placed in the Blocked state and won't be released until one of the user's four queued jobs has started. While a job is blocked its time on the queue is counted as zero, so it gets no priority gain from waiting in a blocked state. There are also per-user running job limits on some job types- see later.

Fairshare value contributes to priority. It is a measure of how much CPU time a user has had lately. If they are over the configured value then their priority decreases relative to other users, and if they are under then it increases.

The job expansion factor also counts towards priority, and helps short jobs when the machine is full. It is calculated as (1 + time on the queue / wall clock limit for job). This factor increases much more rapidly for short jobs than long jobs. Note that if a job gets blocked by the throttling rule then time spent blocked doesn't count towards the total time on the queue.

Finally some queues have per-user job number limits on them. If you find your jobs go into the Blocked state for no obvious reason, use mdiag -b to see what is going on. It is probably a per-user job limit on the job class.

Can't find what you're looking for?

Then you might find our A-Z site index useful. Or, you can search the site using the box at the top of the page, or by clicking here.