By ULHPC Licence GitHub issues Github Documentation Status GitHub forks

HPC workflow with sequential jobs

 Copyright (c) 2013-2019 UL HPC Team <[email protected]>

Prerequisites

Make sure you have followed the tutorial "Getting started".

Introduction

For many users, the typical usage of the HPC facilities is to execute 1 program with many parameters. On your local machine, you can just start your program 100 times sequentially. However, you will obtain better results if you parallelize the executions on a HPC Cluster.

During this session, we will see 3 use cases:

  • Exercise 1: Use the serial launcher (1 node, in sequential and parallel mode);
  • Exercise 2: Use the generic launcher, distribute your executions on several nodes (python script);
  • Exercise 3: Advanced use case, using a Java program: "JCell".
  • Exercise 4: Advanced use case, distributing embarrassingly parallel tasks with GNU Parallel within a slurm job

We will use the following github repositories:

Pre-requisites

Connect to the cluster access node, and set-up the environment for this tutorial

You can chose one of the 3 production cluster hosted by the University of Luxembourg.

For the next sections, note that you will use Slurm on Iris.

(yourmachine)$> ssh iris-cluster

If your network connection is unstable, use screen:

(access)$> screen

We will work in the home directory.

You can check the usage of your directories using the command df-ulhpc

(access)$> df-ulhpc
Directory                         Used  Soft quota  Hard quota  Grace period
---------                         ----  ----------  ----------  ------------
/home/users/hcartiaux             3.2G  100G        -           none

Note that the user directories are not yet all available on Iris, and that the quota are not yet enabled.

Create a sub directory $SCRATCH/PS2, and work inside it

(access)$> mkdir $SCRATCH/PS2
(access)$> cd $SCRATCH/PS2

In the following parts, we will assume that you are working in this directory.

Clone the repositories ULHPC/tutorials and ULHPC/launcher-scripts.git

(access)$> git clone https://github.com/ULHPC/launcher-scripts.git
(access)$> git clone https://github.com/ULHPC/tutorials.git

In order to edit files in your terminal, you are expected to use your preferred text editor:

If you have never used any of them, nano is intuitive, but vim and emacs are more powerful.

With nano, you will only have to learn a few shortcuts to get started:

  • $ nano <path/filename>
  • quit and save: CTRL+x
  • save: CTRL+o
  • highlight text: Alt-a
  • Cut the highlighted text: CTRL+k
  • Paste: CTRL+u

Exercise 1: Object recognition with Tensorflow and Python Imageai

In this exercise, we will process some images from the OpenImages V4 data set with an object recognition tools.

Create a file which contains the list of parameters (random list of images):

(access)$>  find /work/projects/bigdata_sets/OpenImages_V4/train/ -print | head -n 10000 | sort -R | head -n 50 | tail -n +2 > $SCRATCH/PS2/param_file

Step 0: Prepare the environment

(access)$> srun -p interactive -N 1 --qos qos-interactive --pty bash -i

Load the default Python module

(node) module load lang/Python

(node) module list

Create a new python virtual env

(node) cd $SCRATCH/PS2
(node) virtualenv venv

Enable your newly created virtual env, and install the required modules inside

(node) source venv/bin/activate

(node) pip install tensorflow scipy opencv-python pillow matplotlib keras
(node) pip install https://github.com/OlafenwaMoses/ImageAI/releases/download/2.0.2/imageai-2.0.2-py3-none-any.whl

(node) exit

Step 1: Naive workflow

We will use the launcher NAIVE_AKA_BAD_launcher_serial.sh (full path: $SCRATCH/PS2/launcher-scripts/bash/serial/NAIVE_AKA_BAD_launcher_serial.sh).

Edit the following variables:

  • MODULE_TO_LOAD must contain the list of modules to load before executing $TASK,
  • TASK must contain the path of the executable,
  • ARG_TASK_FILE must contain the path of your parameter file.
    (node)$> nano $SCRATCH/PS2/launcher-scripts/bash/serial/NAIVE_AKA_BAD_launcher_serial.sh
    
        MODULE_TO_LOAD=(lang/Python)
        TASK="$SCRATCH/PS2/tutorials/basic/sequential_jobs/scripts/run_object_recognition.sh"
        ARG_TASK_FILE=$SCRATCH/PS2/param_file
    
Using Slurm on Iris

Launch the job, in interactive mode and execute the launcher:

(access)$> srun -p interactive -N 1 --qos qos-interactive --pty bash -i

(node)$> source venv/bin/activate
(node)$> $SCRATCH/PS2/launcher-scripts/bash/serial/NAIVE_AKA_BAD_launcher_serial.sh

Or in passive mode (the output will be written in a file named BADSerial-<JOBID>.out)

(access)$> sbatch $SCRATCH/PS2/launcher-scripts/bash/serial/NAIVE_AKA_BAD_launcher_serial.sh

You can use the command scontrol show job <JOBID> to read all the details about your job:

(access)$> scontrol show job 207001
JobId=207001 JobName=BADSerial
   UserId=hcartiaux(5079) GroupId=clusterusers(666) MCS_label=N/A
   Priority=8791 Nice=0 Account=ulhpc QOS=qos-batch
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:23 TimeLimit=01:00:00 TimeMin=N/A
   SubmitTime=2018-11-23T10:01:04 EligibleTime=2018-11-23T10:01:04
   StartTime=2018-11-23T10:01:05 EndTime=2018-11-23T11:01:05 Deadline=N/A

And the command sacct to see the start and end date

(access)$> sacct --format=start,end --j 207004
              Start                 End
------------------- -------------------
2018-11-23T10:01:20 2018-11-23T10:02:31
2018-11-23T10:01:20 2018-11-23T10:02:31

In all cases, you can connect to a reserved node using the command srun and check the status of the system using standard linux command (free, top, htop, etc)

(access)$> srun -p interactive --qos qos-interactive --jobid <JOBID> --pty bash

During the execution, you can see the job in the queue with the command squeue:

(access)$> squeue
         JOBID PARTITION     NAME             USER ST       TIME  NODES NODELIST(REASON)
        207001     batch BADSeria        hcartiaux  R       2:27      1 iris-110

Using the system monitoring tool ganglia, check the activity on your node.

Step 2: Optimal method using GNU parallel (GNU Parallel)

We will use the launcher launcher_serial.sh (full path: $SCRATCH/PS2/launcher-scripts/bash/serial/launcher_serial.sh).

Edit the following variables:

(access)$> nano $SCRATCH/PS2/launcher-scripts/bash/serial/launcher_serial.sh

MODULE_TO_LOAD=(lang/Python)
TASK="$SCRATCH/PS2/tutorials/basic/sequential_jobs/scripts/run_object_recognition.sh"
ARG_TASK_FILE=$SCRATCH/PS2/param_file

Submit the (passive) job with sbatch

(access)$> sbatch $SCRATCH/PS2/launcher-scripts/bash/serial/launcher_serial.sh

Question: compare and explain the execution time with both launchers:

  • Naive workflow: time = ? CPU usage for the sequential workflow

  • Parallel workflow: time = ? CPU usage for the parallel workflow


Exercise 2: Watermarking images in Python

We will use another program, watermark.py (full path: $SCRATCH/PS2/tutorials/basic/sequential_jobs/scripts/watermark.py), and we will distribute the computation on 2 nodes with the launcher parallel_launcher.sh (full path: $SCRATCH/PS2/launcher-scripts/bash/generic/parallel_launcher.sh).

This python script will apply a watermark to the images (using the Python Imaging library).

The command works like this:

python watermark.py <path/to/watermark_image> <source_image>

We will work with 2 files:

  • copyright.png: a transparent images, which can be applied as a watermark
  • images.tgz: a compressed file, containing 30 JPG pictures (of the Gaia Cluster :) ).

Step 0: python image manipulation module installation

In an interactive job, install pillow in your home directory using this command:

(access IRIS)>$ srun -p interactive -N 1 --qos qos-interactive --pty bash -i


(node)>$ pip install --user pillow

Step 1: Prepare the input files

Copy the source files in your $SCRATCH directory.

(access)>$ tar xvf /mnt/isilon/projects/ulhpc-tutorials/sequential/images2.tgz -C $SCRATCH/PS2/
(access)>$ cp /mnt/isilon/projects/ulhpc-tutorials/sequential/ulhpc_logo.png $SCRATCH/PS2

(access)>$ cd $SCRATCH/PS2

Step 2: Create a list of parameters

We must create a file containing a list of parameters, each line will be passed to watermark.py.

ls -d -1 $SCRATCH/PS2/images/*.JPG | awk -v watermark=$SCRATCH/PS2/ulhpc_logo.png '{print watermark " " $1}' > $SCRATCH/PS2/generic_launcher_param
\_____________________________/   \_________________________________________________________________/ \_________________________________/
               1                                                    2                                                3
  1. ls -d -1: list the images
  2. awk ...: prefix each line with the first parameter (watermark file)
  3. >: redirect the output to the file $SCRATCH/generic_launcher_param

Step 3: Configure the launcher

We will use the launcher parallel_launcher.sh (full path: $SCRATCH/PS2/launcher-scripts/bash/generic/parallel_launcher.sh).

Edit the following variables:

(access)$> nano $SCRATCH/PS2/launcher-scripts/bash/generic/parallel_launcher.sh

TASK="$SCRATCH/PS2/tutorials/basic/sequential_jobs/scripts/watermark.py"
ARG_TASK_FILE="$SCRATCH/PS2/generic_launcher_param"
# number of cores needed for 1 task
NB_CORE_PER_TASK=2

Step 4: Submit the job

We will spawn 1 process per 2 cores on 2 nodes

On Iris, the Slurm job submission command is sbatch

(access IRIS)>$ sbatch $SCRATCH/PS2/launcher-scripts/bash/generic/parallel_launcher.sh

Step 5: Download the files

On your laptop, transfer the files in the current directory and look at them with your favorite viewer. Use one of these commands according to the cluster you have used:

(yourmachine)$> rsync -avz iris-cluster:/scratch/users/<LOGIN>/PS2/images .

Question: which nodes are you using, identify your nodes with the command sacct or Slurmweb


Exercise 3: Advanced use case, using a Java program: "JCell"

Let's use JCell, a framework for working with genetic algorithms, programmed in Java.

We will use 3 scripts:

  • jcell_config_gen.sh (full path: $SCRATCH/PS2/tutorials/basic/sequential_jobs/scripts/jcell_config_gen.sh)

We want to execute Jcell, and change the parameters MutationProb and CrossoverProb. This script will install JCell, generate a tarball containing all the configuration files, and the list of parameters to be given to the launcher.

  • jcell_wrapper.sh (full path: $SCRATCH/PS2/tutorials/basic/sequential_jobs/scripts/jcell_wrapper.sh)

This script is a wrapper, and will start one execution of jcell with the configuration file given in parameter. If a result already exists, then the execution will be skipped. Thanks to this simple test, our workflow is fault tolerant, if the job is interrupted and restarted, only the missing results will be computed.

  • parallel_launcher.sh (full path: $SCRATCH/PS2/launcher-scripts/bash/generic/parallel_launcher.sh)

This script will drive the experiment, start and balance the java processes on all the reserved resources.

Step 1: Generate the configuration files:

Execute this script:

    (access)$> $SCRATCH/PS2/tutorials/basic/sequential_jobs/scripts/jcell_config_gen.sh

This script will generate the following files in $SCRATCH/PS2/jcell:

  • config.tgz
  • jcell_param

Step 2: Edit the launcher configuration, in the file $SCRATCH/PS2/launcher-scripts/bash/generic/parallel_launcher.sh.

This application is cpu-bound and not memory-bound, so we can set the value of NB_CORE_PER_TASK to 1. Using these parameters, the launcher will spawn one java process per core on all the reserved nodes.

    (access)$> nano $SCRATCH/PS2/launcher-scripts/bash/generic/parallel_launcher.sh

    TASK="$SCRATCH/PS2/tutorials/basic/sequential_jobs/scripts/jcell_wrapper.sh"
    ARG_TASK_FILE="$SCRATCH/PS2/jcell/jcell_param"
    # number of cores needed for 1 task
    NB_CORE_PER_TASK=1

Step 3: Submit the job

On Iris, the Slurm job submission command is sbatch

(access IRIS)>$ sbatch $SCRATCH/PS2/launcher-scripts/bash/generic/parallel_launcher.sh

Step 4. Retrieve the results on your laptop:

Use one of these commands according to the cluster you have used:

    (yourmachine)$> rsync -avz iris-cluster:/scratch/users/<LOGIN>/PS2/jcell/results .

Question: check the system load and memory usage with Ganglia


Exercise 4: Advanced use case, distributing embarrassingly parallel tasks with GNU Parallel

GNU Parallel) is a tool for executing tasks in parallel, typically on a single machine. When coupled with the Slurm command srun, parallel becomes a powerful way of distributing a set of tasks amongst a number of workers. This is particularly useful when the number of tasks is significantly larger than the number of available workers (i.e. $SLURM_NTASKS), and each tasks is independent of the others.

To illustrate the advantages of this approach, a sample launcher script is proposed under scripts/launcher.parallel.sh.

It will invoke the command stress to impose a CPU load during 60s 8 times (with an increasing hang time, i.e. 1 to 8s). We have thus 8 tasks, and we will create a single job handling this execution

More precisely, each task consists of the following command:

stress --cpu 1 --timeout 60s --vm-hang <n>
  • If run sequentially, this workflow would take at least 8x60 = 480s i.e. 8 min
  • we will invoke the proposed launcher which will bundle the execution using NO MORE THAN $SLURM_NTASKS (4 in the below tests), i.e. in approximately 2 minutes
# Go to the appropriate directory
$> cd  $SCRATCH/PS2/tutorials/basic/sequential_jobs
$> ./scripts/launcher.parallel.sh -h
NAME
    launcher.parallel.sh [-n] [TASK]

    Using GNU parallel within a single node to run embarrasingly parallel
    problems, i.e. execute multiple times the command '${TASK}' within a
    'tunnel' set to run NO MORE THAN ${SLURM_NTASKS} tasks in parallel.

    State of the execution is stored in logs/state.parallel.log and is used to
    resume the execution later on, from where it stoppped (either due to the
    fact that the slurm job has been stopped by failure or by hitting a walltime
    limit) next time you invoke this script.
    In particular, if you need to rerun this GNU Parallel job, be sure to delete
    the logfile logs/state*.parallel.log or it will think it has already
    finished!

    By default, the 'stress --cpu 1 --timeout 60s --vm-hang <arg>' command is executed
    with the arguments {1..8}

OPTIONS
  -n --noop --dry-run:   dry run mode

EXAMPLES
  Within an interactive job (use --exclusive for some reason in that case)
      (access)$> si --exclusive --ntasks-per-node 4
      (node)$> ./scripts/launcher.parallel.sh -n    # dry-run
      (node)$> ./scripts/launcher.parallel.sh
  Within a passive job
      (access)$> sbatch --ntasks-per-node 4 ./scripts/launcher.parallel.sh
  Within a passive job, using several cores (6) per tasks
      (access)$> sbatch --ntasks-per-socket 2 --ntasks-per-node 4 -c 6 ./scripts/launcher.parallel.sh

  Get the most interesting usage statistics of your jobs <JOBID> (in particular
  for each job step) with:
     sacct -j <JOBID> --format User,JobID,Jobname,partition,state,time,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist,ConsumedEnergyRaw

Step 1: dry-run tests in an interactive jobs

For some reason, if you intend to really execute this script within the interactive partition, you will need to use the --exclusive flag. By default, we will only make a dry-run tests in this case

# Get an interactive job -- use '--exclusive' (but everybody cannot be serve in
# this case) if you really intend to run the commands
(access)$> srun -p interactive --exclusive --ntasks-per-node 4 --pty bash
# OR, simplier: 'si --exclusive --ntasks-per-node 4'
(node)$> echo $SLURM_NTASKS
4
(node)$> ./scripts/launcher.parallel.sh -n
### Starting timestamp (s): 1576074072
parallel --delay .2 -j 4 --joblog logs/state.parallel.log --resume srun  --exclusive -n1 -c 1 --cpu-bind=cores stress --cpu 1 --timeout 60s --vm-hang {1} ::: 1 2 3 4 5 6 7 8
### Ending timestamp (s): 1576074072"
# Elapsed time (s): 0

Beware that the GNU parallel option --resume makes it read the log file set by
--joblog (i.e. logs/state*.log) to figure out the last unfinished task (due to the
fact that the slurm job has been stopped due to failure or by hitting a walltime
limit) and continue from there.
In particular, if you need to rerun this GNU Parallel job, be sure to delete the
logfile logs/state*.parallel.log or it will think it has already finished!

# Release your job
(node)$> exit    # OR 'CTRL+D'

Step 2: real test in passive job

Submit this job using sbatch

(access)$> sbatch ./scripts/launcher.parallel.sh

Check that you have a single job in the queue, assigned as many nodes/cores as was requested:

(access)$> sq    # OR 'squeue -u $(whoami)'  OR 'sqs'
             JOBID PARTITION                           NAME     USER ST       TIME  TIME_LEFT  NODES NODELIST(REASON)
           1359477     batch                    GnuParallel svarrett  R       0:00    1:00:00      1 iris-093

In this launcher script, GNU Parallel maintains a log of the work that has already been done (under logs/state.parallel.log), along with the exit value of each step (useful for determining any failed steps).

(access)$> tail logs/state.parallel.log
Seq     Host    Starttime       JobRuntime      Send    Receive Exitval Signal  Command
1       :       1576074282.818      60.116      0       121     0       0       srun  --exclusive -n1 -c 1 --cpu-bind=cores stress --cpu 1 --timeout 60s --vm-hang 1
2       :       1576074283.041      60.127      0       121     0       0       srun  --exclusive -n1 -c 1 --cpu-bind=cores stress --cpu 1 --timeout 60s --vm-hang 2
3       :       1576074283.244      60.134      0       121     0       0       srun  --exclusive -n1 -c 1 --cpu-bind=cores stress --cpu 1 --timeout 60s --vm-hang 3
4       :       1576074283.457      60.128      0       121     0       0       srun  --exclusive -n1 -c 1 --cpu-bind=cores stress --cpu 1 --timeout 60s --vm-hang 4

As indicated in the help message, the state of the execution is stored in the joblog file logs/state.parallel.log which is used to resume the execution later on from --resume, in case your job is stopped, either due to the fact that the slurm job has been stopped (by failure) or by hitting the walltime limit). To resume from a past execution, you simply need to re-run the script.

/!\ IMPORTANT In particular, if you need to rerun this GNU Parallel job, be sure to delete the joblog file logs/state.parallel.log or it will think it has already finished!

rm logs/state.parallel.log

You can notice the slurm logfile set to GnuParallel-<JOBID>.out

(access)$> cat GnuParallel-*.out

Step 3: get the usage statistics of your job

You can extract from the slurm database the usage statistics of this job, in particilar with regards the CPU and energy consumption for each job step corresponding to a GNU parallel task.

# /!\ ADAPT <JOBID> with the appropriate Job ID. Ex: 1359477
$> sacct -j 1359477 --format User,JobID,Jobname,partition,state,time,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist,ConsumedEnergyRaw
     User        JobID    JobName  Partition      State  Timelimit    Elapsed     MaxRSS  MaxVMSize   NNodes      NCPUS        NodeList ConsumedEnergyRaw
--------- ------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- -------- ---------- --------------- -----------------
svarrette 1359477      GnuParall+      batch  COMPLETED   01:00:00   00:02:01                              1          4        iris-093             26766
          1359477.bat+      batch             COMPLETED              00:02:01     23948K    178784K        1          4        iris-093             26747
          1359477.ext+     extern             COMPLETED              00:02:01          0    107956K        1          4        iris-093             26766
          1359477.0        stress             COMPLETED              00:01:00       124K    248536K        1          1        iris-093             13380
          1359477.1        stress             COMPLETED              00:01:01       124K    248536K        1          1        iris-093             13391
          1359477.2        stress             COMPLETED              00:01:00       124K    248536K        1          1        iris-093             13402
          1359477.3        stress             COMPLETED              00:01:00       124K    248536K        1          1        iris-093             13408
          1359477.4        stress             COMPLETED              00:01:00       124K    248536K        1          1        iris-093             13121
          1359477.5        stress             COMPLETED              00:01:00       124K    248536K        1          1        iris-093             13118
          1359477.6        stress             COMPLETED              00:01:00       124K    248536K        1          1        iris-093             13117
          1359477.7        stress             COMPLETED              00:01:00       124K    248536K        1          1        iris-093             13118

# Another nice slurm utility is 'seff <JOBID>'
$> seff 1359477
Job ID: 1359477
Cluster: iris
User/Group: svarrette/clusterusers
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 4
CPU Utilized: 00:08:00
CPU Efficiency: 99.17% of 00:08:04 core-walltime
Job Wall-clock time: 00:02:01
Memory Utilized: 23.39 MB
Memory Efficiency: 0.14% of 16.00 GB

As can be seen, the jobs was using quite efficiently the allocated CPUs, but not the memory.


Conclusion

At the end, please clean up your home and scratch directories :)

Please do not store unnecessary files on the cluster's storage servers:

(access)$> rm -rf $SCRATCH/PS2

For going further: