Power8jobs

Hartree Centre Logo

Back to Contents Page

LSF Documentation - Panther and Paragon

Last updated: 3/5/2018

Note: you must now specifiy wall time requirement as in the examples below.

Description of Panther and Paragon Architectures

Job submission to Panther and Paragon will be from the "login cluster" known as Fairthorpe. The login and compute nodes are ll PPC64LE (Little Endian) architecture enabling software to be developed and tested there on the login nodes. Fairthorpe is connected to CDS, the Common Data Store which is used for long term storage of users' applications and data.

The compute nodes on Panther and Paragon will be referred to below as the "execution cluster". They have their own high speed local file system which is used while running the job.

Architecture Schematic v2

Architecture Details

LSF Job Submission

The LSF batch scheduler on Panther closely resembles that on the Phase-2 clusters Napier and Iden, so please refer to jobs for basic details. Users can copy their Phase-2 submission scripts and only the queue definition (if any) will need to be modified. New details for transferring data to and from Panther are given below.

Data mover jobs

This is the most significant enhancement to our HPC environment. In addition to the independent file systems on the submission host and execute host, there is a data cache or "staging area" on the execution cluster. You must use new LSF "-data" syntax and scripts to specify which data and how its moved between these areas. In particular, you need to specify what files need to be copied from Fairthorpe to the Panther or Paragon execution cluster at job submission.

When a job is submitted, LSF will split these into compute and data transfer jobs. For each file moved in and out of the staging area, there will be a new "transfer" job. This means for each job you submitted, you may see a number of other jobs spawn from it. There is a separate data transfer queue, so these jobs can be managed concurrently with other compute jobs.

The workflow will resemble the following,

  1. select which cluster to use
  2. Data in:
    • CDS -> staging area -> working directory
  3. Execution:
  4. Data out:
    • Working directory -> staging area with tag creation
    • Working directory -> staging area -> CDS

The following examples copy a file or directory from the login cluster to the execution cluster.

Example 1 – Explicit

Note 1: you can only specify regular files in the "bstage out -src" option. DIrectories and symbolic links are not permitted.

Note 2: "bstage in -all" will take files from the specified directory and "flatten" them into the "-dst" directory. "bstage in -src" will re-produce the directory structure.

Example 2 – Implicit

Note in this example, that "bstage" by default executes in the current working directory on the compute node (e.g. Panther). "bstage out" returns files by default to the submit directory on Fairthorpe.

Note 3: the man page for "bstage in -all" says:

Copy all the files that are requested with the job submission to the job current working directory. The command finds the location of each requested stage in file in the cache. All files are copied to the folder in a flat directory structure. Input files with the same name overwrite one another. In many situations this will not be what you want. so the above example could be misleading.

Example 3: - Data specification file

It is possible to use a text file which lists all the requirements for a job. This can be useful as it can be scripted as part of the job submission, e.g. to change directory or file names. It is also less clumsy if there are a lot of files named. Each line in the file specifies a file or directory to be transferred to the staging area before the job is submitted. These must be absolute paths. The file must start with the string "#@dataspec" as follows.

If you don't specify datahost it will assume the submission host.

In the job submission script you can now use the "-data" option to reference this file and to parse its contents.

Note 4: We will show an alternative way to do this in worked example 3-step Keras workflow.

Access to Interactive Nodes

On Panther: "module load use.panther; bsub -q pantherI -W 00:59 -Is /bin/bash"

On Paragon: "module load use.paragon; bsub -q paragonI -W 00:59 -Is /bin/bash"

Using the gpus

Since updating to LSF-10 it is necessary to specifiy that one or more GPUs is required for a job, e.g. when getting a node for interactive use:

E.g. "bsub -gpu "num=1:mode=exclusive_process" -q paragonI -Is /bin/bash"

But this is not sufficient when using Spectrum MPI. It is also necessary to put the "-gpu" flag on the mpirun command.

E.g. "mpirun -np N -gpu <your app>

Job Arrays (not yet tested)

Job arrays in LSF also utilise the LSF environment variables to make specifying data files more flexible.

This translates to jobs as follows:

  1. data_Jobarray[1] copying over file /gpfs/cds/project/group/name/array/file-1.in
  2. data_Jobarray[2] copying over file /gpfs/cds/project/group/name/array/file-2.in

Examples from the Man Page

The following job requests three data files for staging, listing them one by one in the -data option:

bsub -o %J.out -data "hostA:/proj/std/model.tar hostA:/proj/user1/case02341.dat hostB:/data/human/seq342138.dna" /share/bin/job_copy.sh

The following job requests the same data files for staging. Instead of listing them individually in the -data option, the required files are listed in a data specification file named /tmp/dataspec.user1:

bsub -o %J.out -data "/tmp/dataspec.user1" /share/bin/job_copy.sh

The data specification file /tmp/dataspec.user1 contains the paths to the required files:

The following job requests all data files in the directory /proj/std/ on hostA:

bsub -o %J.out -data "hostA:/proj/std/*" /share/bin/job_copy.sh

The following command submits an array job that requests data files by array index. All data files must exist.

bsub -o %J.out -J "A[1-10]" -data "hostA:/proj/std/input_%I.dat" /share/bin/job_copy.sh

The following job requests data files by tag name.

bsub -o %J.out -J "A[1-10]" -data "tag:SEQ_DATA_READY" "tag:SEQ_DATA2" /share/bin/job_copy.sh

The following job requests the data file /proj/std/model.tar, which belongs to the user group design1:

bsub -o %J.out -data "hostA:/proj/std/model.tar" -datagrp "design1" my_job.sh

More Worked Examples

Please use the following links to see details of worked examples for some of the codes that we support. Thanks to input from the people mentioned in each example.

Further Notes

Note 5: LSF manages stderr and stdout, so there is no need to specify these back from the cluster with bstage out.

Note 6: If specifying a directory, LSF will automatically copy recursively all sub-directories.

Note 7: To check the status of data mover jobs, use the following command,

"bjobs -data"

Summary of file syntax

Files and directories can be specified either explicitly or implicitly as illustrated above. Some more details are explained here.

One "#BSUB -data" line or line in a specification file is required for each file or directory required.

If the host name is missing from the path, it is assumed to be the job submission host (Fairthorpe in this case). It is possible to transfer between hosts using the syntax "<hostname>:<file path>".

The bstage command works as follows.

To copy all the files from an input data specification to a particular directory on the execution host you can use something like

"bstage in -all -dst <directory>"

For instance this could be a temporary directory used to run the job. The files are put in this directory in a flat structure (no directory hierarchy is preserved from the source).

It is also possible to use the "-link" option, which will create sumbolic links to files in the cache instead of making copies in the final directory. This could be useful if running many times with the same data but different run-time parameters.

Options for "bstage out" are similar, but only regular files can be specified, directories and links are not permitted. If you do not specify the "-dst" destination, it is assumed to be the host and directory from which the job was originally submitted (i.e. Fairthorpe).

Parallel Jobs using Spectrum MPI

The default parallel environment for the Power-8 system is now IBM's Spectrum MPI which is based on an optimised version of OpenMPI.

Applications should be compiled using the spectrum_mpi module which provides the and the mpicc, mpic++, mpif77 and mpif90 wrappers. A couple of changes are required in the LSF script to run these jobs.

Interactive Jobs

See Phase-2 job submission jobs.

If files are required, a data mover job will be required beforehand to copy over the data. Likewise if data is to be returned, a separate data mover job will be required to fetch the data.

Job can specify "-data" on the bsub command line and then run the bstage commands by hand.

Managing cached data

Once a file has been copied to the execution cluster's staging area, the file will remain in the cache, currently for 30 days. If you specify a job with a data requirement, LSF will check to see if it already resides in the cache (by performing a checksum), if the file is the same, LSF will use the cached copy.

To check if a file is already in the cache, something like the following command can be used.

Tagging

To reduce the amount of data copying between jobs, e.g. as part of a workflow, it is possible to use a feature called "tagging" in LSF. This allows you to preserve a copy of output files or directories in the data staging cache. This is particularly useful if you are running jobs against the same input data, as the overheard in copying data between the CDS and execution cluster is eliminated.

An example of caching output data can be seen below. The flow is a three step process. The first part copies the necessary data files to the remote cluster, the second part stages them into the cache, the third then submits a job using this cached data.

Part 1 - Copy the output data to the staging area and create a tag

Part 2 - Specifying the same tag in input

Note 8:Using tags with "bstage -out" the man page says: if you specify -tag, the file is only copied to the tag folder in the staging area and no transfer job is submitted.

Worked example Panther 3-step Keras workflow show how output tags can be used to create dependencies for a workflow.

To see what tags are currently present in the cache, you can use the following command. This currently only work from Panther, because that's the staging area where the tags are cached. There may be other tags on other hosts soon.

bdata tags list –u <username>
#or 
bdata tags list -dmd panther

Data that has been tagged will currently persist in the cache for 30 days (since last access). All data that hasn't been used will be deleted from the cache after that time.

To delete a tag, use the following command.

bdata tags clean <tag_name>

This is also illustrated in worked example Panther 3-step Keras workflow.

Things to note:

  1. It is possible to add further data to a tag. New data will be combined with the existing data.
  2. Tag names cannot contain forward slashes or spaces.
  3. Tag names cannot start with a ‘-‘

Further Information

  1. for options to the bstage command type "man bstage" or "bstage -help"
  2. for options to the bdata command type "man bdata" or "bdata -help"
  3. for options to bsub, including the -data sub-options, type "man bsub"
  4. for network options to bsub see external link: https://www.ibm.com/support/knowledgecenter/SSETD4_9.1.3/lsf_admin/pe_network_aware_sched.html
  5. For more information and detailed use cases, see: external link: https://www.ibm.com/developerworks/community/wikis/form/anonymous/api/wiki/99245193-fced-40e5-90df-a0e9f50a0fb0/page/22e9aefe-a2e8-46e6-ad62-2ff5860f45aa/attachment/04278373-c422-45e8-be01-70de61a8aa09/media/lsf_data_manager_using.pdf
  6. There is also a YouTube video here: external link: https://www.youtube.com/watch?v=xJeJ3_KDDY4
  7. Linux on Power Community Wiki: external link: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/W51a7ffcf4dfd_4b40_9d82_446ebc23c550
  8. IBM Advance Toolchain for Power Linux: external link: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/W51a7ffcf4dfd_4b40_9d82_446ebc23c550/page/IBM%20Advance%20Toolchain%20for%20PowerLinux%20Documentation

Back to Contents Page