A set of software packages for running, tuning, and debugging parallel programs. It contains the ifort and icc compilers, the Math Kernel Library, the Trace Analyzer and Collector, and parallel debugging tools.
The package can be downloaded from Intel.
This is proprietary software and requires a licence to run. We have a licence for zero and deathstar but not the other clusters or workstations.
Instructions for users
To get access to the software you need to have loaded the appropriate modules. A complete set would include the modules icc (Intel C/C++), ifort (Intel Fortran), idb (Intel Debugger), itac (Intel Trace Analyzer and Collector), and mpi/impi (Intel MPI library). Note that the MPI library comes in several versions as it can be configured for use with either GNU or Intel compilers and there is a special module for debugging. Don't use that one in normal use.
This provides the compiler wrappers mpicc, mpif90, mpicxx, and mpif77. Use these to compile and link your MPI code. They all understand the -show option to display what they actually do.
To run your code you need to use the launcher mpiexec. This will only work inside a Torque job. If you want to do some small tests, get an interactive session in the test queue with something like:
qsub -I -q test -l nodes=1:ppn=4
Then just launch your job with mpiexec:
$ mpiexec ./testc Hello world: rank 0 of 4 running on gpu-0-2.local Hello world: rank 1 of 4 running on gpu-0-2.local Hello world: rank 2 of 4 running on gpu-0-2.local Hello world: rank 3 of 4 running on gpu-0-2.local
mpiexec talks to the queueing system to find out how many processes to launch and where to put them, so you don't need to give it any options.
Launching jobs in batch sessions is exactly the same.
Intel Trace Analyzer and Collector
This tool lets you trace what your program is up to and analyze the results. It can also detect MPI deadlocks and do other correctness checking. It can be used to analyze non-MPI programs, and even binaries. This will only cover MPI- see the manual for full details.
To use it you first need to make sure the itac module is loaded, and then compile and link your MPI program with the trace option:
$ mpicc -trace -o hello hello.c
Then run the program in the normal way. A collection of files will appear in the program's working directory. One of these will be called hello.stf (or whatever your executable was called). From the head node start up the graphical analyser interface by typing traceanalyzer hello.stf .
It's not actually quite that simple. The tracing libraries use an immense amount of memory- so much so that you can easily cause the compute nodes in your job to swap and crash if your code does a lot of MPI calls. You therefore probably want to cut down your test case to be as small as possible or use a config file to tell the tracing libraries what to collect. See the full ITC documentation (type module help itac to find out where it is) for details of what can go into a configuration file. In order to get your batch job to see the config file you need to set the environment variable VT_CONFIG in your job to contain the filename.
Debugging Intel MPI code
You will need the idb and debug version of Intel MPI modules loaded. Be sure to log in with X forwarding enabled. Compile your code with the debug flags in the usual way:
mpicc -g -o hello hello.c
The only actual difference between the debug IMPI and the regular IMPI module is that you get a different version of mpiexec in your PATH- in fact the one that comes with Intel MPI. This one doesn't interface nicely with the queueing systems, but does let you launch under a debugger. Consequently launching your job is a bit more complicated than usual.
$ qsub -X -I -V -q s8 # interactive qsub with X forwarding and environment qsub: waiting for job 7409.zero.ch.cam.ac.uk to start qsub: job 7409.zero.ch.cam.ac.uk ready [cen1001@gpu-0-3 ~]$ cd test_intel_mpi [cen1001@gpu-0-3 test_intel_mpi]$ mpdboot [cen1001@gpu-0-3 test_intel_mpi]$ mpdtrace gpu-0-3 gpu-0-4 [cen1001@gpu-0-3 test_intel_mpi]$ mpiexec -idb -n 8 ./testc
Note that you have to tell mpiexec how many cores to run over. If all goes well an xterm should appear with the debugger running inside it.
At the moment this all only works within a single node so you can only debug over up to eight cores on zero and 16 on deathstar. This is almost certainly due to the environment not getting passed to the other nodes in the job. If you need multinode debugging please ask the Computer Officers to set it up.
The Intel tools come with lost of documentation, mainly PDF. The 'module help' command will tell you where it is for each module.
I had to install xterm and libstdc++ on all the compute nodes to get these tools going.
In order to make the Intel MPI 3.1 library work with OSC mpiexec on zero I had to do a patch on the library. Details can be found on the OSC mailing list in this thread. Essentially you search and replace the string "NULL string" with "NULL_string" in the binaries. The patched version is in /usr/local and the untouched original in /share/apps on zero. We have a feature request in with Intel for mpiexec support. None of this was needed on deathstar with mpiexec 0.84 and Intel MPI 4.0.0.028.
Curiously the 0.82 mpiexec that shipped with zero doesn't work with Intel MPI at all, and we had to compile our own 0.83. This may be due to the way the 0.82 was configured on zero.
OSC mpiexec does not support the -idb flag so this is why there is a 'debug' version of the IMPI module, which doesn't add it to the PATH. You therefore end up with Intel's own mpiexec if you load that module. This mpiexec is harder to use in a batch job; see above.
A record of local experiments with Intel MPI may be found on the Computer Officers' wiki (Computer Officers and CSC members only) at http://wikis.ch.cam.ac.uk/cosdocco/wiki/index.php/Zero_setup#Intel_MPI. Some notes on itac are on the Theory sector wiki at http://wikis.ch.cam.ac.uk/cuc3/wiki/index.php/Intel_Trace_Analyzer_and_C... .