skip to content

This document is for anyone who has to manage dexter. There's not much in here for end users.

Day to day stuff

Adding users

Dexter picks up its user accounts from the Active Directory. Make the user an AD account if they don't have one already and then add them to the 'dexter-users' group which can be found in the Wales container. The wales-users and frenkel-users groups are members of this group so their users are added automatically. It is harmless to add someone again though. A cron job runs once per hour and inserts copies of any new user accounts in 'dexter-users' into dexter's local OpenLDAP system. It doesn't do Unix groups other than the personal group yet. Passwords are checked directly with the AD and not stored locally.

Give the user a copy of or point them at dexter user notes. If they haven't used any of the local clusters before, also give them the Theory sector Maui/Torque introduction to get them going.


See the local Maui admin guide. The queue setup is very basic.

Node access control for jobs is completely open. Any user can log into any node. The Torque prologue script makes the local /scratch directories on nodes as needed.

Parallel tools

pexec still exists. You can do a great deal from the cmgui cluster management GUI, or the command line equivalent cmsh. cmgui has a bad habit of rewriting system files that you don't want it to (eg the queueing system config), so best to only use it for things like powering on and off.

Startup and shutdown

The nodes are on IPMI and can be powered up and down from the head node.

# cmsh -c "device; power on -c default"
# cmsh -c "device; power off -n node002"


There are user docs on the web at dexter's pages, some stuff in the filesystem at /cm/shared/docs/cm.

Updating software

# yum update # even kernel updates are safe now
# chroot /cm/images/default-image 
## yum update

Don't forget to reboot if you need to activate new kernels.

Adding software

Use yum to see if you can get the package as part of the OS.

Generally best to put 3rd party applications under /usr/local where they are NFS-shared to the nodes, and any modules to /usr/local/modulefiles.

If you need to add something to every compute node that can't go in /usr/local you need to add it using the parallel tools and then edit the node install image so that reinstalled nodes still get it. This lives under /cm/images/default-image/.

chroot /cm/images/default-image
yum install foobar

After doing this reboot a node and check it all still works. Nodes resync with the image on every boot. Or you can get fancy with the Cluster Manager GUI (cmgui) by making test images and putting one node on the test image.

Dealing with problems

Reinstalling the nodes

The nodes sync with the node image on boot so just reboot any misbehaving ones if you suspect software problems.

Hardware problems

Dexter's nodes are not in pairs but quads, however the numbering bears no relation to the physical arrangement. With the quads the individual motherboards can be removed without affecting the other three nodes in the chassis so if you do ever need to remove RAM etc. there's no need to drain anything other than the node you want to work on.
If you need to remove a node tell PBS first:

pbsnodes -o # yes, you need the full name. Do the twin too.
checknode nodeXXX # wait til there are no reservations on both twins

Once a node is back

pbsnodes -c


On pip.

Tech support, quote id 120439. You probably need to use the support portal for hardware problems. Username/password in the box.

Other useful information


The disks on the head node are arranged as a RAID1 and a RAID6. Hobbit keeps an eye on it.


Chunks of /usr/local/shared are synced from the network every day to pick up new versions of compilers.

You can use the web interface to the IPMI cards on the nodes by starting firefox on the head node and pointing it at node0XX.ipmi.cluster . And logging in with the right password, of course. If you're logged in as root,

cmsh -c "partition use base;get bmcpassword"

System status 

System monitoring page

Can't find what you're looking for?

Then you might find our A-Z site index useful. Or, you can search the site using the box at the top of the page, or by clicking here.