Jobs

Hartree Centre Logo

Back to Contents Page

Running jobs on the Bifort and Bantam Blue Gene/Q systems

Last modified: 20/3/2016

Quick Links

Default Behaviour for MPI Task Placement inside and across Nodes

When you submit a job to be run on the Blue Gene/Q system, you use LoadLeveler and the "runjob" command, see runjob. We provide sample jobs below for a variety of cases.

When Joule was installed, the smallest allocation of nodes which could be made was of 128 nodes which equals 2,048 cores. This minimum size was determined by the hardware configuration of our Blue Gene/Q because each active partition requires the inclusion of at least one I/O node, and our configuration has only eight I/O nodes per rack of 1,024 compute nodes, or 1 I/O node per 128 compute nodes.

With such an allocation you will, by default, have 128 MPI tasks launched, one on each node. Since December 2014 this restriction has been removed as we can now run sub-block jobs. Generally speaking, there is now no lower limit on the number of compute nodes you can request with bg_size, it will automatically get rounded up to an even number, but should otherwise work. Here is an example LoadLeveler job script requesting an allocation of 32 compute nodes to run a job of 512 MPI tasks and 4 threads each.

#@bg_size=32
#@job_type=bluegene
##@input=bgtest_in.txt
#@output=stdout.$(jobid).txt
#@error=stderr.$(jobid).txt
#@wall_clock_limit=00:10:00
#@executable=/bgsys/drivers/ppcfloor/hlcs/bin/runjob
#@arguments= --exe hello_mpi_openmp -p 16 -n 512 --envs OMP_NUM_THREADS=4
#@class=prod
#@notification=complete
#@queue

If you don't ask for anything explicitly, you will get a single MPI task on each Blue Gene/Q 16-core node, and each MPI task will then have access to the 16GB of memory on each node. This might be what you want, but probably isn't.

To change this default behaviour, make sure you use either the "-p" or "--ranks-per-node" parameter as an argument passed to runjob by LoadLeveler to specify the number of MPI processes you want to run on each node:

#@arguments = … -p 2 …
#@arguments = … --ranks-per-node 2 …

The number of processes per node must be a power of two, and can take any of the following values: 1, 2, 4, 8, 16, 32 or 64. Typically 16 will be chosen as this is equal to the number of physical cores.

Each A2 processor core supports up to 4 active hardware threads, all threads are available to each process when up to 16 processes per node are used, and if you request 32 or 64 processes per node you need to realise that the core's resources such as the floating point unit are now being shared between two or four processes.

By default, even if you have compiled your code with OpenMP ("-qsmp=omp" with the IBM compilers) you will get one OpenMP thread per MPI task. To change this behaviour, you need to set the OMP_NUM_THREADS environment variable for the Blue Gene/Q job to the number of threads per MPI task you want to have run, and one way of accomplishing this is to use the "--envs" parameter:

#@arguments … --envs OMP_NUM_THREADS=2 …
#@arguments … --envs OMP_NUM_THREADS=4 …

WARNING: It's "--envs", not "--env".

You probably want to ensure that the number of MPI tasks per node ("-p") multiplied by the number of threads per task does not exceed 64.

You can define the precise number of MPI tasks using the "-n" or "--np" parameter:

#@arguments = … -n 40 …
#@arguments = … --np 40 …

If you request fewer MPI tasks than the number of nodes allocated could provide, the job will run on the number of nodes for which you request MPI tasks, using the defined scheme for mapping tasks inside and across nodes, which might result in some nodes having no MPI tasks.

Node Allocation and Process Mapping

Whether or not you use all the allocated nodes, MPI tasks are placed in rank order using a mapping process. The network topology for the Blue Gene/Q is a five dimensional torus or mesh, with direct links between the nearest neighbours in all five dimensions. The default mapping is to place MPI ranks on the system in ABCDET order, where the rightmost letter increments first, where <A,B,C,D,E> are the torus coordinates and T is the processor ID in each node (T ranges from 0 to N-1, where N is the number of processes per node being used).

By default, using default mapping and the default of one process per node, the assignment which results is as follows

and so on until all of the processes are mapped.

Note that the E dimension is always 2 on the Blue Gene/Q system - the fifth dimension of the torus is in fact a single link between a pair of nodes.

What this default mapping also means that when you ask for more than one process per node, the processes are allocated across each node in rank order first. So if you ask for 4 MPI processes per node, the first 4 ranks are allocated to the first node, the second 4 to the second node, and so on, as follows

The default mapping order can be over-ridden by using the "--mapping" parameter.

Batch job submission with LoadLeveler

New Note that for the Bantam BGAS system you should use class=bgas instead of class=prod. The examples below refer to the Bifort BG/Q system.

You can see what is available as follows.

-bash-4.1$ llq -X all
===== Cluster prod =====

Id                       Owner      Submitted   ST PRI Class        Running On 
------------------------ ---------- ----------- -- --- ------------ -----------
bglogin1.44552.0         ssr25-fxn0 11/25 07:44 R  50  qres01       bgqsn1     
bglogin1.44553.0         rja87-jpf0 11/25 08:46 R  50  q4h1024      bgqsn1     
bglogin1.40467.0         mxm86-smp0 10/3  23:03 I  50  q4h4096                 
bglogin1.40468.0         mxm86-smp0 10/3  23:03 I  50  
…

40 job step(s) in queue, 8 waiting, 0 pending, 2 running, 30 held, 0 preempted

This shows the 6 rack production system. You will probably be using the production system in class BGQ.

To see what classes are available do the following:

[bglogin1]$ llclass
Name                 MaxJobCPU     MaxProcCPU  Free   Max Description          
                    d+hh:mm:ss     d+hh:mm:ss Slots Slots                      
--------------- -------------- -------------- ----- ----- ---------------------
prod                 undefined      undefined     5     5                      
interactive          undefined      undefined     1     1                      
qres02               undefined      undefined    32    32                      
qsmall               undefined      undefined    10    10                      
q48hall              undefined      undefined     0     2                      
qres01               undefined      undefined     6     6                      
q1h2048t             undefined      undefined     1     1                      
q12h6144             undefined      undefined     0     1                      
q4h6144              undefined      undefined     1     1                      
q1h6144              undefined      undefined     2     2                      
q12h4096             undefined      undefined     0     1                      
q4h4096              undefined      undefined     2     2                      
q1h4096              undefined      undefined     2     2                      
q12h3072             undefined      undefined     0     2                      
q4h3072              undefined      undefined     2     2                      
q1h3072              undefined      undefined     2     2                      
q12h2048             undefined      undefined     0     3                      
q4h2048              undefined      undefined     3     3                      
q1h2048              undefined      undefined     3     3                      
q12h1024             undefined      undefined     0     5                      
q4h1024              undefined      undefined     4     5                      
q1h1024              undefined      undefined     5     5                      
q20m1024             undefined      undefined     1     1                      
q4h512               undefined      undefined     0     1                      
q1h512               undefined      undefined     1     1                      
q20m512              undefined      undefined     2     2                      
q4h256               undefined      undefined     1     3                      
q1h256               undefined      undefined     3     3                      
q20m256              undefined      undefined     4     4                      
q4h128               undefined      undefined     2     2                      
q1h128               undefined      undefined     8     8                      
q20m128              undefined      undefined    16    16                      
BGQ                  undefined      undefined     4     4

LoadLeveler will use a filter script to allocate your job to a specific class depending on the walltime set in your submission script.

Small and Long Jobs

The queue policy can be seen here: external link: http://community.hartree.stfc.ac.uk/wiki/site/admin/queues.html

Warning Some users have been found to be writing job submission scripts on the BG/Q that re-submit themselves when the job ends. This is not an acceptable mode of usage, because it a) "queue jumps" other user jobs, and b) because of the way the Blue Gene architecture works, such jobs consume valuable resources on the BG service nodes.

Therefore we ask all users to ensure that their submission scripts do not "chain" themselves together. We reserve the right to kill user jobs that we see exhibiting this behaviour. Acceptable types of jobs are show below.

Single Step Jobs

A simple job submission script might look as follows. This script tries to submit an executable called "hello_mpi_openmp" and run in a partition of size 64 for 10 minutes. Note that the executable is always "runjob"; your compute node executable is supplied as an argument to runjob.

#@bg_size=64
#@job_type=bluegene
##@input=bgtest_in.txt
#@output=stdout.$(jobid).txt
#@error=stderr.$(jobid).txt
#@wall_clock_limit=00:10:00
#@executable=/bgsys/drivers/ppcfloor/hlcs/bin/runjob
#@arguments= --exe hello_mpi_openmp -p 16 -n 1024 --envs OMP_NUM_THREADS=4
#@class=prod
#@notification=complete
#@queue

Note that "@bg_size" represents number of nodes. Each node has 16 cores. Each core has up to 4 threads. This example uses OpenMP to control the number of threads per mpi task. The "@arguments" keyword allows you to specify the number of mpi tasks per node (--ranks-per-node), the total number of mpi tasks (--np) and the number of OpenMP threads (--envs OMP_NUM_THREADS=4). The formula to remember is bg_size x ranks-per-node = np , but you can then adjust OpenMP

Key words: there is a useful definition of the key words from the JUGENE Web site here: external link: http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JUGENE/UserInfo/LoadLeveler.html But note this is for a Blue Gene/P so there may be some differences in detail. For a definitive list go to external link: http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.loadl.v5r1.load100.doc%2Fc2367923_xtoc.html and navigate to the section entitled "Job command file keyword descriptions".

To find out more about the options available you can do:

-bash-4.1$ /bgsys/drivers/ppcfloor/hlcs/bin/runjob -h
/bgsys/drivers/ppcfloor/hlcs/bin/runjob [options] : exe arg1 arg2 … argn

Job Options:
  --exe arg                 executable to run
  --args arg                arguments
  --envs arg                environment variables in key=value form
  --exp-env arg             export a specific environment variable
  --env-all                 export all environment variables
  --cwd arg (=current wdir) current working directory
  --timeout arg             positive number of seconds to wait after runjob 
                            starts before a SIGKILL will be delivered.

Resource Options:
  --block arg                      block ID, must be initialized and requires 
                                   Execute authority.
  --corner arg                     sub-block compute node corner location: R00-
                                   M0-N04-J00
  --shape arg                      five dimensional sub-block shape, in terms 
                                   of compute nodes: 1x2x2x1x2
  -p [ --ranks-per-node ] arg (=1) number of ranks per node: 1, 2, 4, 8, 16, 
                                   32, or 64
  -n [ --np ] arg                  positive number of ranks in the entire job
  --mapping arg (=ABCDET)          ABCDET permutation or path to mapping file

Debug Options:
  --label [=arg(=long)] (=none) prefix job output with stdout, stderr, and rank
  --strace arg (=none)          specify  none, or n where n is a rank to enable
                                system call tracing
  --start-tool arg              path to tool to start with the job
  --tool-args arg               arguments for the tool
  --tool-subset arg (=0-$max)   rank subset to use when launching the tool 
                                daemon

Miscellaneous Options:
  --stdinrank arg (=0)  rank to send stdin to
  --raise               if the job dies with a signal, raise it

  -h [ --help ]         this help text
  -v [ --version ]      display version information
  --properties arg      Blue Gene configuration file
  --verbose arg         Logging configuration


for more information, consult the man page

A copy of the man page is here: runjob.

There is a great deal more about runjob options in the Blue Gene/Q system administration Redbook, which can be found in our document repository at external link: http://community.hartree.stfc.ac.uk/access/content/group/admin/Documentation/IBM/BG_Q/. The relevant details are on pages 77-80.

Note: if you are using --args, you will need it for each argument to be passed. e.g. "runjob --exe <myexe> --args <arg1> --args <arg2>". In this case it would be better to use the notation "runjob : <myexe> <arg1> <arg2>".

In order to allocate your job to the correct LoadLeveler queue, it is first run through a filter script. You can run your job submission through the filter script manually in order to check that it hasn't mangled anything. Do:

cat mpi_primes_xlf.sub | /gpfs/packages/LoadLeveler/BlueGene-filter.pl

In particular, you should see the job class change, so for example:

fen1:/gpfs/home/package_build/build/djc87-build/bgq $ cat mpi_primes_xlf.sub
#@bg_size=128
#@executable=/bgsys/drivers/ppcfloor/hlcs/bin/runjob
#@job_type=bluegene
#@arguments= --exe /gpfs/home/package_build/build/djc87-build/bgq/mpi_primes_xlf
#@class=prod
#@input=/dev/null
#@output=/gpfs/home/package_build/build/djc87-build/bgq/mpi_primes_xlf_$(jobid).out
#@error=/gpfs/home/package_build/build/djc87-build/bgq/mpi_primes_xlf_$(jobid).err
#@wall_clock_limit=00:20:00
#@notification=complete
#@queue

fen1:/gpfs/home/package_build/build/djc87-build/bgq $ cat mpi_primes_xlf.sub | /gpfs/packages/LoadLeveler/BlueGene-filter.pl
#@ cluster_list = prod
#@ bg_size = 128
#@ executable = /bgsys/drivers/ppcfloor/hlcs/bin/runjob
#@ job_type = bluegene
#@ arguments = --exe /gpfs/home/package_build/build/djc87-build/bgq/mpi_primes_xlf
#@class=prod
#@ input = /dev/null
#@ output = /gpfs/home/package_build/build/djc87-build/bgq/mpi_primes_xlf_$(jobid).out
#@ error = /gpfs/home/package_build/build/djc87-build/bgq/mpi_primes_xlf_$(jobid).err
#@ wall_clock_limit = 00:20:00
#@ notification = complete
#@ queue

To see the currently available classes type "llclass". If you submit a job and the class is not available it will not run.

To submit the job and check its status:

-bash-4.1$ llsubmit bgtest
llsubmit: The job "bglogin1.71" has been submitted.

To see the current status of the job queues, use:

fen1:/gpfs/home/package_build/build/djc87-build $ llq
Id                       Owner      Submitted   ST PRI Class        Running On
------------------------ ---------- ----------- -- --- ------------ -----------
bglogin1.29.0            bgqadmin    9/10 17:05 I  50  BGQ
bglogin1.30.0            bgqadmin    9/10 17:05 I  50  BGQ
bglogin1.77.0            bgqadmin    9/14 14:58 I  50  BGQ
bglogin1.84.0            bgqadmin    9/14 15:12 I  50  BGQ
bglogin1.83.0            bgqadmin    9/14 15:12 I  50  BGQ

5 job step(s) in queue, 5 waiting, 0 pending, 0 running, 0 held, 0 preempted

Or, for a longer format, use:

fen1:/gpfs/home/package_build/build/djc87-build/bgq/modules-3.2.9 $ llq -x
Id                Owner     Submitted   ST PRI Class    Running On Job CPU
----------------- --------- ----------- -- --- -------- ---------- -------------
bglogin1.84.0     bgqadmin   9/14 15:12 I  50  BGQ                 none recorded
bglogin1.83.0     bgqadmin   9/14 15:12 I  50  BGQ                 none recorded
bglogin1.77.0     bgqadmin   9/14 14:58 I  50  BGQ                 none recorded
bglogin1.30.0     bgqadmin   9/10 17:05 I  50  BGQ                 none recorded
bglogin1.29.0     bgqadmin   9/10 17:05 I  50  BGQ                 none recorded

5 job step(s) in queue, 5 waiting, 0 pending, 0 running, 0 held, 0 preempted

To check the scheduling status of your job, use something like this:

-bash-4.1$ llq -s 71

===== EVALUATIONS FOR JOB STEP bglogin1.71.0 =====

Step state                       : Idle
Considered for scheduling at     : 

Waiting for block LL12091414534415 to INITIALIZE.

This job could be cancelled by typing "llcancel 71". Note that if the job was submitted on a different login node from the one you're currently logged into, you may need to specify the full jobid, for example "bglogin2.491.0".

USER_HOLD

If, for any reason, your job cannot read input file(s) or create/update output file(s), perhaps due to file/directory permissions errors, your job will go into a USER HOLD status. If you can resolve the problem, you can release the job by using "llhold -r <jobid>":

Scripted Jobs

An alternative is to submit a job as a bash script. Here is a simple example. Note that the "#@executable" argument must be the same as the filename of the script itself. In this example, the script is called "bgscript".

#!/bin/bash

## job requirements
######################################################################
#@bg_size=128
#@job_type=bluegene
#@executable=bgscript
#@class=prod
#@environment=COPY_ALL
#@output=$(jobid).output.txt
#@error=$(jobid).error.txt
#@wall_clock_limit=00:20:00
#@notification=complete
#@queue

## commands to be executed
## this loads the dl_meso/2.5 module and runs the code
######################################################################
source /etc/profile.d/modules.sh
module load dlmeso/2.5
printenv
EXE=`which dpd.exe`
runjob --env-all --exe $EXE -n 32 -p 4 --args ""

Multi-Step Jobs

We have found two ways to run multi-step jobs.

Simple Method

This uses LoadLeveler syntax.

#!/bin/bash

#@bg_size=128
#@executable=/bgsys/drivers/ppcfloor/hlcs/bin/runjob
#@job_type=bluegene
#@arguments= --exe hello_mpi_openmp --ranks-per-node 4 --np 512 --envs OMP_NUM_THREADS=16
#@class=prod 
##@input=bgtest_in.txt
#@output=$(jobid).$(stepid).output.txt
#@error=$(jobid).$(stepid).error.txt
#@wall_clock_limit=00:10:00
#@notification=complete
#@queue

#@arguments= --exe hello_mpi_openmp --ranks-per-node 4 --np 512 --envs OMP_NUM_THREADS=16 
#@queue

#@arguments= --exe hello_mpi_openmp --ranks-per-node 4 --np 512 --envs OMP_NUM_THREADS=16 
#@queue

Scripted Method

This allows each job step to be described in a more sophisticated way using a bash script. Again, the "#@executable@ argument must be the same as the filename of the script itself.

#!/bin/bash

## general requirements and allocation of partition
######################################################################
#@bg_size=128
#@job_type=bluegene
#@job_type=bluegene
#@executable=bgtest
#@class=prod
#@environment=COPY_ALL

## requirements for job step 1
######################################################################
#@step_name=step_1
#@output=$(jobid).$(stepid).output.txt
#@error=$(jobid).$(stepid).error.txt
#@wall_clock_limit=00:20:00
#@notification=complete
#@queue

## requirements for job step 1
######################################################################
#@step_name=step_2
#@dependency=(step_1==0)
#@output=$(jobid).$(stepid).output.txt
#@error=$(jobid).$(stepid).error.txt
#@wall_clock_limit=00:20:00
#@notification=complete
#@queue

## commands executed for this job step
######################################################################
case $LOADL_STEP_NAME in
# first step just printes the environment
  step_1) echo "Working on $LOADL_STEP_NAME"
    source /etc/profile.d/modules.sh
    module load dlmeso/2.5
    printenv
      ;;
# secon step runs the code
  step_2) echo "Working on $LOADL_STEP_NAME"
    source /etc/profile.d/modules.sh
    module load dlmeso/2.5
    EXE=`which dpd.exe`
    runjob --env-all --exe $EXE -n 32 -p 4 --args ""
      ;;
  *) echo "Nothing to do for $LOADL_STEP_NAME"
      ;;
esac

Sub-jobs

Sub-jobs are referred to in some of the IBM Redbooks, e.g. external link: http://www.redbooks.ibm.com/abstracts/sg247948.html . The documentation describes the capability to run multiple jobs from a single user in a single partition at the same time. When this is done, each job shares the IO node(s) allocated to the partition. So in theory 4x32-node jobs can be run in a single 128-node partition, and 128 nodes is the smallest partition size possible on Blue Joule, as described above. Indeed, for "throughput" computing it would even be possible to run 128x1-node jobs in a single partition.

The current implementation requires the specification of a "corner" of each sub-job in the format of a precise specification of the hardware location of the corner point of the job. Since LoadLeveler allocates a random, dynamic block for the jobs which are run, there is no way of knowing in advance precisely which block will be used and therefore the precise hardware location of the "corner" of the sub-job.

We have implemented the method used at CINECA, see external link: http://www.hpc.cineca.it/content/batch-scheduler-loadleveler-0 . Here is a sample job script as used on our system. It runs 8 instances of the IMB MPI benchmark across 256 cores each.

#!/bin/bash
#@ job_name = SUB-JOB
#@executable=this_script_name
#@ output = stdout.bg.txt
#@ error = stderr.bg.txt
#@ environment = COPY_ALL
#@ job_type = bluegene
#@ class=prod
#@ bg_size = 128
#@ initialdir = .
#@ input = /dev/null
#@ wall_clock_limit = 00:19:00
#@ notification = never
#@ queue

# User Section - please modify only the following variables
######################################################################

# Dimension of bg_size, must be same as in the LoadLeveler keyword
# NOTE: this is only implemented for bg_size=128 on Blue Joule
export N_BGSIZE=128

# No. of required sub-jobs. For N_BGSIZE=128 you can
# choose between 2, 4, 8, 16, 32, 64, 128.
export N_SUBJOB=8

 # No. of MPI tasks in each node
export RANK_PER_NODE=16

# No. of MPI tasks in each sub-job
export NPROC=$(( ( $RANK_PER_NODE * $N_BGSIZE ) / $N_SUBJOB ))
echo "$NPROC processes per sub-job"

# module load <your application>
source /etc/profile.d/modules.sh

export WDR=$PWD
export EXE_1="../IMB-MPI1 Sendrecv"
export EXE_2="../IMB-MPI1 Sendrecv"
export EXE_3="../IMB-MPI1 Sendrecv"
export EXE_4="../IMB-MPI1 Sendrecv"
export EXE_5="../IMB-MPI1 Sendrecv"
export EXE_6="../IMB-MPI1 Sendrecv"
export EXE_7="../IMB-MPI1 Sendrecv"
export EXE_8="../IMB-MPI1 Sendrecv"
# etc.
######################################################################

export EXECUTABLES="$EXE_1,$EXE_2,$EXE_3,$EXE_4,$EXE_5,$EXE_6,$EXE_7,$EXE_8"

# array of executable names
n_exe () { echo $EXECUTABLES | awk -F',' "{print \$$1}"; }

echo "work dir: " $WDR
echo "executable: " $EXECUTABLES

module load subjob
source ${SUBBLOCK_HOME}/bgsize_${N_BGSIZE}/npart_${N_SUBJOB}.txt

for i in `seq 1 $N_SUBJOB`;
do
  if [ ! -d $WDR/dir_$i ]; then
      mkdir dir_$i
      cd dir_$i
  else
      cd dir_$i
  fi
  echo $(n_exe $i)
  echo "runjob --verbose 4 --env-all --corner $(n_cor $i) --shape $SHAPE_SB -n $NPROC -p $RANK_PER_NODE : $(n_exe $i)" > my_echo.txt
  runjob --verbose 4 --env-all --corner $(n_cor $i) --shape $SHAPE_SB -n $NPROC -p $RANK_PER_NODE : $(n_exe $i) > stdout_$i.txt &
  cd ..
done

# wait for all runjobs to finish
wait

For further information please consult the runjob man page here: runjob.

A Note on Sub-block Jobs

For most purposes this section can be ignored. It gives additional details for those who might be interested and illustrates the complexity of the Blue Gene system.

For Blue Gene/Q, a sub-block job is a job that occupies less than an entire compute block. An example was given right at the start of this chapter. When sub-block resources are requested (i.e. bg_size is less than 128), LoadLeveler will dynamically calculate the starting compute node (corner) of a compute block and pass that information to the runjob command. The user does not need to do anything special for this to happen.

Sub-block jobs on Blue Gene/Q can co-exist with each other on a single compute block. These jobs share the I/O nodes associated with the compute block but run on different compute node cores. Sub-block jobs can provide better utilisation of compute resources by reducing the number of idle compute nodes and packing jobs more tightly onto the system. For example, if sub-block jobs support is not used (as originally on our system), and you have a job that only needs 10 compute nodes, a 128 node block will leave 118 of the compute nodes idle. Using the sub-block job software feature now installed the same 128 compute node block can be used to run the 10 compute node job and other sub-block jobs together.

This is achieved because you can also have a sub-block job running on a set of compute nodes that have no direct connections to an I/O node. The Blue Gene/Q software in this situation will route I/O traffic to bridging compute nodes that have connections to the available I/O nodes. This is one of the primary purposes of sub-block jobs, to enable more jobs than there are I/O connections. So using sub-block jobs, you eliminate the restrictions on number of simultaneous jobs that would have been imposed by the number of I/O nodes. This may be a problem if jobs are very I/O intensive.

As Loadleveler will share sub_block_job capable blocks for multiple jobs, a failure of a job running on that block can cause the block to fail and therefore affect the other jobs, ordinarily a non-shared block failure would not impact any of the other currently running jobs. The system is configured to permit 32 failed compute nodes in a block, when this number is exceeded, LoadLeveler will allow any remaining sub-block jobs to complete normally, but will not start any additional sub-block jobs on the block. Once all jobs running in the block have completed, LoadLeveler will free the block. This behaviour should normally be transparent to the user.

More Information about LoadLeveler

Note: There is no support for LoadLeveler jobs that request job steps to be executed on both HPC systems at HC, i.e. one job step is performed on the BG/Q and the another step is performed on the iDataPlex cluster).

Back to Contents Page