Xeon Phi User Guide

Hartree Centre Logo

Back to Contents Page

Hartree Centre User Guide Chapter E2 - Xeon Phi System

Last modified: 3/3/2015

Quick Links

iDataPlex Phase-2

Accessing the Xeon Phis

The Xeon Phis are attached to 42 of the nodes of the Phase-2 Wonder iDataPlex system in chassis "1A" or "1C". Host nodes have alternate numbers idb1a01 to idb1a41 or idb1c02 to idb1c42. To access the system you must first be a member of a valid project and then ssh into one of the Phase-2 login nodes.

IP AddressDNS Name

There are 3 steps to physically access a Xeon Phi on the system:

  1. SSH onto a login node
  2. get interactive access to a Phi host node
  3. SSH to the MIC co-processor

An interactive shell on an available host node can be started as follows:

bsub -q phiq -Is bash
source /etc/profile.d/modules.sh

After a few seconds this should return a command prompt. If this does not happen it may be that the node is already in use (access is exclusive). To check, use a command like this "bjobs -uall | grep idb1" which will identify any users of chassis 1a or 1c using the phiq. Note that the second line is sourcing the module environment which will be needed later.

To SSH into a MIC co-processor you will need a key pair if you do not already have one.

cd ~/.ssh
ssh-keygen -f mic.key -t rsa -b 1024 
cat mic.key.pub  >> authorized_keys

# add the IdentityFile line to ~/.ssh/config
echo "IdentityFile ~/.ssh/mic.key" | cat >> config

# ensure its only you that can read it
chmod 600 config

# now try logging onto the MIC
ssh idb1a07-mic0

The rest of this document assumes you are working on a host node unless stated otherwise.

Compiling Applications

How is the software organised and used? The following image is taken from Intel site: external link: https://software.intel.com/en-us/articles/intelr-cluster-studio-xe-works-on-xeonr-phi-coprocessor-openmp-tbb-mpi .

Phi sw configuration

Typically you will be working on the host node to either compile an application to run natively on the MIC or to run on the host with certain data and functions off-loaded. Applications can be launched on the MIC with SSH or mpirun or direct from the binary if its an off-load code.

Compiling and linking threaded or MPI applications for MIC

Here is a very simple MPI application. It is set to check the number of MPI ranks allocated and then set the number of threads for each process to 240/size. That way it will always execute 240 threads. If you launch 60 MPI processes on the MIC you should get 4 threads per core.

To compile it to run native on the MIC do the following.

Compiling threaded off-load applications for host and MIC

Here is a variant of the above code which has offload pragmas.

This can be compiled as follows, note the "-mmic" option is removed now.

Running Interactive Jobs

We will illustrate different modes of operation as follows.

Running code native on MIC with threads or MPI

The first method uses SSH with the keys that were generated above.

The last line is using mpiexe.hydra natively on the MIC.

The next method uses the micnativeloadex utility as follows.

The simple MPI program we compiled above can also be run on the MIC using mpirun on the host as follows.

Note that we have done nothing to control the order of printing here.

Running code on host with off-load of data and functions to attached MIC

We can run the mpi_offload program compiled above simply by invoking the binary on the host processor. The parts to run on the MIC will be off-loaded onto the attached co-processor automatically when it runs.

Symmetric or Hybrid Mode

In this case we want to run one executable on the host and another on the Phi in a single MPI context, e.g. with 2 ranks on the host and 30 ranks on the Phi.

Using Intel VTune Amplifier

We will consider using the sample code provided with VTune which can be found here. /gpfs/stfc/local/apps/intel/intel_cs/vtune_amplifier_xe_2015/samples/en/C++/matrix_vtune_amp_xe.tgz .

Sample instructions are provided on the Intel Web site: external link: https://software.intel.com/en-us/articles/how-to-analyze-xeon-phi-coprocessor-applications-using-intel-vtune-amplifier-xe-2015 .

We cannot run the amplxe-gui directly on the host node, so we will do the profiling using the command line amplxe-cl tool. Briefly we performed the following steps.

> tar xzvf /gpfs/stfc/local/apps/intel/intel_cs/vtune_amplifier_xe_2015/samples/en/C++/matrix_vtune_amp_xe.tgz .
> cd matrix/linux
> module load intel/15.1.133_mic intel_mpi/5.0.2_mic
# remove the reference to mic-pushed in the makefile or simply do
> make matrix.mic

# do a test run
> mpirun -envall -n 1 -host idb1a09-mic0 ./matrix.mic
Addr of buf1 = 0x7f8866d9a010
Offs of buf1 = 0x7f8866d9a180
Addr of buf2 = 0x7f885fd19010
Offs of buf2 = 0x7f885fd191c0
Addr of buf3 = 0x7f8858c98010
Offs of buf3 = 0x7f8858c98100
Addr of buf4 = 0x7f8851c17010
Offs of buf4 = 0x7f8851c17140
Threads #: 240 OpenMP threads
Matrix size: 3840
Using multiply kernel: multiply1
Freq = 1.052630 GHz
Execution time = 25.494 seconds

# now try with VTune
> module load intel_vtune/2015
> amplxe-cl -c advanced-hotspots -r vtune -target-system=mic-native:0 \
  --target-install-dir=/gpfs/stfc/local/apps/intel/intel_cs/vtune_amplifier_xe_2015 \
  -- /<full path to>/matrix/linux/matrix.mic

For the native hybrid MPI-OpenMP program illustrated above we might run something like the following.

amplxe-cl -c general-exploration -cpu-mask=1-64 -r vtune -target-system=mic-native:0 \
  --search-dir all:rp=. \
  --target-install-dir=/gpfs/stfc/local/apps/intel/intel_cs/vtune_amplifier_xe_2015 \
  -- "/gpfs/stfc/local/apps/intel/intel_mpi/ -envall -n 60 \
  /<full path to>/mpi_mic"

Afterwards the analysis can be viewed using amplxe-gui back on the login node retrieving the data from the vtune directory which was created during the run.

Note that this is only a starting point, there are many options which can be passed to VTune and different collection modes. For more information please consult the Intel documentation.

If you get an error message of the form "amplxe: Error: Cannot start data collection. nmi_watchdog interrupt capability is enabled on your system, which prevents collecting accurate event-based sampling data. Please disable nmi_watchdog interrupt or see the Troubleshooting section of the product documentation for details." or any other error message when using VTune please notify us via hartree@stfc.ac.uk.

Running Batch Jobs

This section assumes we are on one of the the system login nodes. An application will be compiled from a Phi host node as described above. Once that has been done it can be submitted to execute using the LSF batch system. This will provide a better fair share of the resources for users.

Here is an example job script which will run 8 offload jobs using MPI actoss 8 host nodes.

#BSUB -J MIC_offload
#BSUB -o stdout.%J.txt
#BSUB -e stderr.%J.txt
# request 8 hosts
#BSUB -n 8
# run one process per host
#BSUB -R "span[ptile=1]"
#BSUB -W 0:19
# use Phi queue
#BSUB -q phiq


#Load modules
source /etc/profile.d/modules.sh
module load intel/15.1.133_mic
module load intel_mpi/5.0.2

# currently no OFA on Phi rack, so processes need to communicate over TCP
export I_MPI_FABRICS=shm:tcp

# count how many processors are actually allocated

echo $NP

# initialise devices
export OFFLOAD_INIT=on_start

# use SINK_LD_LIBRAR_PATH if doing micnativeloadex
export SINK_LD_LIBRARY_PATH=/gpfs/stfc/local/apps/intel/intel_mpi/$MIC_LD_LIBRARY_PATH

# set LD_LIBRARY_PATH to match what would be seen if inside an interactive session
export LD_LIBRARY_PATH=/gpfs/stfc/local/apps/intel/intel_cs/2015.1.133/composer_xe_2015.1.133/compiler/lib/mic:\

# several options are now possible

# this worked
mpiexec.hydra -np $NP -genvall ./mpi_offload

# this worked

# this worked
#micnativeloadex $PWD/mpi_mic

# this worked
#ssh mic0 "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH; $HOME/phi-testing/mpi_mic"

New We will put other sample jobs scripts on a separate page here.

Installed Software

  • Intel Composer compiler suite v2015.1.133, C, C++ and Fortran
  • Intel MKL-11.2
  • Intel MPI-5.0.2
  • Intel tool suite v2015.1: Vtune Amplifier, Inspector, Explorer
  • MPSS-3.4.2

New Some random FAQs on other installed software can be found on a separate page here.

Further Information

General information from Wikipedia: external link: http://en.wikipedia.org/wiki/Xeon_Phi .

Intel Xeon Phi Web pages: external link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-detail.html .

Assorted links to material for software developers: external link: https://software.intel.com/mic-developer . Includes a MIC Developer Blog for some questions and answers.

A relatively recent tutorial from LRZ is here: external link: https://www.lrz.de/services/compute/courses/x_lecturenotes/MIC_GPU_Workshop/micworkshop-micprogramming.pdf

Other information from ICHEC external link: https://www.ichec.ie/infrastructure/xeonphi

PRACE Best Practice Guide: external link: http://www.prace-ri.eu/best-practice-guide-intel-xeon-phi-html

Intel Parallel Computing Centre work site (restricted access): external link: http://community.hartree.stfc.ac.uk/portal/site/IPCC_Europe .

For more information about the VTune command line interface see: external link: https://software.intel.com/en-us/node/529436 .

For information about cross-compiling using GNU AutoTools, see: external link: https://software.intel.com/en-us/articles/autotools-and-intel-xeon-phi-coprocessor .