The ABC cluster is a set of GPU workstations running the Rocks cluster distribution, linked by a private gigabit ethernet network.
To access it, ssh to the head node which is pat.ch.private.cam.ac.uk . Use your Admitto credentials.
Homespace is on a mirrored pair of disks attached to the head node. The /home filesystem is 150GB in size and has quotas (currently 15GB soft, 20GB hard limit). It is backed up regularly. The latest backup is always available under /rsnapshots on the head node. If you need older backups please contact the helpdesk.
/home is shared to all nodes that are part of the cluster, so your job sees the same home directory wherever it is on the machine. It's important to remember that from a compute job's point of view accessing this directory is extremely slow, especially if all the nodes are trying at once. Compute jobs should always write data to a local disk if possible, and copy it back to /home at the end.
There is also a shared scratch filesystem /sharedscratch in which you will have a directory. These are not backed up. They have a quota restriction of 250Gb soft limit and 300Gb hard limit, but it is expected that most people will stay well within that amount. They have the same speed issue as /home.
Each node also has a local /scratch filesystem on which the queueing system will create you a directory when you use the node. These filesystems are about 1.8Tb in size with no quota restriction and are the most appropriate place for your jobs to write temporary files during a run. They are local to each node and so considerably faster than the NFS-mounted /home and /sharedscratch. Please clean up files on /scratch when you are done with them.
A variety of compilers and libraries are installed. Like most local Linux machines the cluster has the modules environment to allow you to switch between different compilers and libraries. By default, no modules are loaded.
All compute jobs should be run through the queueing system. The queueing system will run each job on a free GPU, copying the output back to a user-specified file at the end of the job. The queueing system is SLURM. Please read the ABC SLURM documentation for more details.