Jobs2

Hartree Centre Logo

Back to Contents Page

Running jobs on Iden and Napier (Blue Wonder)

Last modified: 22/11/2016

The job scheduler on Blue Wonder is IBM Platform LSF, v9.2 (Iden and napier). Having first compiled your executable, you can submit it to the job queue with a suitable submission script. We provide some examples below.

Here is a link to the full LSF v9.1 Reference Manual: external link: http://www.slac.stanford.edu/comp/unix/package/lsf/currdoc/lsf_command_ref/index.htm .

Job submission filter

There is a job submission filter on Blue Wonder which will set the following defaults if you do not override them:

  • Number of processors (slots) requested => 24 (i.e. 1 node)
  • Wallclock time => 1hr
  • No resource requirements

Jobs will by default go into the q1h32 queue. (1hr wall clock 32 node maximum, 512 cores maximum)

You may request a walltime longer than 1 hour, in which case your job will go into q12h32 (12hr wall clock 32 node maximum, 512 cores maximum). Note that jobs with walltimes > 1 hour are currently considered to be "at risk" and may be terminated without warning. See the sample submission scripts for details on how to do this.

As of 15/1/2013, new queues for 1 hour and 12 hour jobs on up to 200 nodes have been implemented

Using OpenMPI

Specifying appropriate environment variables directly:

#BSUB -o imb.4.16.out
#BSUB -e imb.4.16.err
#BSUB -R "span[ptile=24]"
#BSUB -n 96            
#BSUB -J imb
#BSUB -W 180

cd ~/idplx/imb
export MYHOME=`pwd`
export OPENMPI_ROOT=/gpfs/packages/gcc/openmpi/1.6
export PATH=$MPIROOT/bin:$PATH
export LD_LIBRARY_PATH=$MPIROOT/lib:$MPIROOT/lib/openmpi:$LD_LIBRARY_PATH
export MYJOB="${MYHOME}/IMB-MPI1"

mpirun -np 96 ${MYJOB}

Using environment modules:

#BSUB -o imb.4.16.out
#BSUB -e imb.4.16.err
#BSUB -R "span[ptile=24]"
#BSUB -n 96            
#BSUB -J imb
#BSUB -W 180

cd ~/idplx/imb
export MYHOME=`pwd`

# setup modules
. /etc/profile.d/modules.sh
module load openmpi-gcc > /dev/null 2>&1

export MYJOB="${MYHOME}/IMB-MPI1"

mpirun -np 96 ${MYJOB}

An explanation of the parameters used:

#BSUB -o imb.4.16.out   <-- Specify an output filename
#BSUB -e imb.4.16.err   <-- Specify an error filename
#BSUB -R "span[ptile=24]"   <-- Request 24 processes per node, which matches the number of cores per node.
  You can set ptile to less than 24 if required.
#BSUB -R "rusage[mem=15000]" <-- Request memory, in this case 15GB 
#BSUB -n 96   <-- Request number of mpi task.  So in this example, we're asking for 4 nodes (4x24) in total.
#BSUB -J imb   <-- Give the job a name.
#BSUB -W 180   <-- Request 180 minutes (3 hours) of wallclock time.

mpirun -np 96 ${MYJOB}   <-- Tell mpirun to start 96 processes (should be the same as above).

Using Intel MPI

Please do NOT use mpdboot, mpiexec or mpirun. They will place all MPI tasks on the same node. Instead, please use mpiexec.hydra as in this example script:

#BSUB -o imb.4.16.out
#BSUB -e imb.4.16.err
#BSUB -R "span[ptile=24]"
#BSUB -n 96            
#BSUB -J imb
#BSUB -W 3:00

cd /gpfs/home/HCP999/xyz01/abc01-xyz01/idplx/imb

# setup modules
. /etc/profile.d/modules.sh
module load intel_mpi > /dev/null 2>&1

export MYJOB="/gpfs/home/HCP999/xyz01/abc01-xyz01/idplx/imb/IMB-MPI1-intel"

mpiexec.hydra -np 96 ${MYJOB}

Note that in this case the wallclock time is formatted as hours:minutes.

Submitting jobs

Submit your job like this:

bsub < myjob.sh

Requesting large memory nodes

We have four nodes in the cluster that each have 256GB of RAM (large memory nodes). You can request them with this syntax in your submission script:

#BSUB -R "rusage[mem=250000]"

This will request 250,000MB, or ~250GB, of memory on each node. Please note that because there are only 4 nodes available, you cannot request more than 96 mpi tasks (4x24).

New Submitting to Phase-2 system queues

If you are logged onto the Phase-2 systems, you have several types of queue to choose from. These are referenced as follows.

queue nametarget system
prod (the default)general queue for most jobs
nxqNextScale nodes
idbqiDataPlex nodes
phiqXeon Phi nodes
mnxinterNextscale interactive (e.g. for debugging and profiling)
midinteriDataPlex interactive
ibmqreservation for IBM benchmarking only

Duration of a job can currently be 20m or less through to 24h and number of nodes can range from 32 to 200. The above queues will automatically select from these ranges depending on the job requirements specified. Unless you want anything specific please just use the "prod" queue.

To see which queues are actually enabled at any time, type "bclusters".

New Job Arrays

A "job array" is a set of jobs submitted using a single script. Typically this is used in cases where the same job has to be run many times with different input data. This is sometimes referred to as task farming, parameter sweep or design of experiments and is typical of an optimisation procedure.

Using job array syntax will permit the LSF scheduler to make fair use of the resources available. The job script has syntax as follows.

#!/bin/bash;

# define an array job
#BSUB -J "My_Jobname[1-20]";
#BSUB -o stdout.%J.%I.txt
#BSUB -e stderr.%J.%I.txt
# … other BSUB options

# look up the index for this job instance
array_index=$LSB_REMOTEINDEX

# open a different input file for each job instance
mpiexec.hydra -n 8 my_executable input.$array_index

When the job runs, you will see output from bjobs as follows.

345998  myid-h RUN   q12h32     login1      ida7c16     My_Jobname[16]   Aug 18 19:18
345998  myid-h RUN   q12h32     login1      ida3c36     My_Jobname[15]   Aug 18 19:18
345998  myid-h RUN   q12h32     login1      ida7a18     My_Jobname[17]   Aug 18 19:18
345998  myid-h RUN   q12h32     login1      ida2a41     My_Jobname[18]   Aug 18 19:18
345998  myid-h RUN   q12h32     login1      ida7a27     My_Jobname[19]   Aug 18 19:18

Monitoring your job

You can use:

bjobs -W

to see your running jobs, or:

bjobs -W -u all

to see all user jobs.

More information is available with:

bjobs -W -l

And you can check scheduling information (perhaps if your job is showing with status "SSUSP") with:

bjobs -W -s <jobid>

"bjobs" has many options - please check the man page for further details.

To see the status of the compute nodes in the system, you can use:

bhosts

New E-Mail Notification

As an alternative method of monitoring, e-Mail notification has been enabled on Wonder Phase-2. You can use the -B flag to send an email when the job gets dispatched and begins execution; and use the -N flag to send an email report when the job completes; the -u flag sets the email address. Here is an example command:

bsub -u user@example.co.uk -B -N < myjob.sh

The options can also be included in the job script itself. Samples of the messages sent are on a separate page here.

Please note that this does not apply to Phase-1.

Killing a job

Use:

bkill <jobid>

As you might expect, you can only kill your own jobs.

Interactive jobs

You can run interactive jobs using the "-I" flag, for example:

bsub -I -n 96 -R "span[ptile=24]" "<command>"

You can also use this method to compile or edit on a compute node, thus freeing up resources on the login node, e.g.

bsub -q interactive -Is emacs <my_file>

Note the use of "-Is" to get a pseudo-terminal with stdin.

More Information

Back to Contents Page