Optimising for Parallel Processing

Overview

Teaching: 15 min
Exercises: 0 min
Questions
  • How can I run several tasks from a single Slurm job.

Objectives
  • Understand how to use GNU Parallel to run multiple programs from one job

Optimising for Parallel Processing

Running a job on multiple cores

By default most programs will only run one job per node, but all SCW/HPCW nodes have multiple CPU cores and are capable of running multiple processes at once without (much) loss of performance.

A crude way to achieve this is to have our job submission script just run multiple processes and background each one with the & operator.

#!/bin/bash --login
###
#job name
#SBATCH --job-name=test
#SBATCH --output=test.out.%J
#SBATCH --error=test.err.%J
#SBATCH --time=0-00:01
#SBATCH --ntasks=3
###

command1 &
command2 &
command3 &

This will run command1,2 and 3 simultaneously. It also requests 3 cores with the ntasks option.

This method has its limits if we want to run multiple tasks after the first ones have completed. Its possible, but scaling it will be harder.

GNU Parallel

GNU Parallel is a utility specially designed to run multiple parallel jobs. It can execute a set number of tasks at a time and when they are complete run more tasks.

GNU Parallel can be loaded a module called “parallel”. Its syntax is a bit complex, but its very powerful.

First lets create a job submission script and call it parallel.sh.

#!/bin/bash --login
#SBATCH -n 12                     #Number of processors in our pool
#SBATCH -o output.%J              #Job output
#SBATCH -t 00:00:05               #Max wall time for entire job

module load parallel

# Define srun arguments:
srun="srun -n1 -N1 --exclusive" 
# --exclusive     ensures srun uses distinct CPUs for each job step
# -N1 -n1         allocates a single core to each task

# Define parallel arguments:
parallel="parallel -N 1 --delay .2 -j $SLURM_NTASKS --joblog parallel_joblog --resume"
# -N 1              is number of arguments to pass to each job
# --delay .2        prevents overloading the controlling node on short jobs
# -j $SLURM_NTASKS  is the number of concurrent tasks parallel runs, so number of CPUs allocated
# --joblog name     parallel's log file of tasks it has run
# --resume          parallel can use a joblog and this to continue an interrupted run (job resubmitted)

# Run the tasks:
$parallel "$srun /bin/bash ./runtask.sh arg1:{1}" ::: {1..32}
# in this case, we are running a script named runtask, and passing it a single argument
# {1} is the first argument
# parallel uses ::: to separate options. Here {1..32} is a shell expansion defining the values for
#    the first argument, but could be any shell command
#
# so parallel will run the runtask script for the numbers 1 through 32, with a max of 12 running 
#    at any one time
#
# as an example, the first job will be run like this:
#    srun -N1 -n1 --exclusive ./runtask arg1:1

Now lets define a sciprt called runtask.sh, this is the script we want parallel to actually run. All it does is wait a random amount of time and output some information about the job on screen.

#!/bin/bash

# this script echoes some useful output so we can see what parallel and srun are doing

sleepsecs=$[($RANDOM % 10) + 10]s

# $1 is arg1:{1} from parallel, it will be a number between 0 and 32
# $PARALLEL_SEQ is a special variable from parallel. It the actual sequence number of the job regardless of the arguments given
# We output the sleep time, hostname, and date for more info>
echo task $1 seq:$PARALLEL_SEQ sleep:$sleepsecs host:$(hostname) date:$(date)

# sleep a random amount of time
sleep $sleepsecs

Now lets go ahead and run the job by using sbatch to submit parallel.sh.

sbatch parallel.sh

This will take a minute or so to run, it will vary depending on the random numbers. If we watch the output of sacct we should see 32 subjobs being created.

8324120.bat+      batch              hpcw0318         12  COMPLETED      0:0 
8324120.0          bash              hpcw0318          1  COMPLETED      0:0 
8324120.1          bash              hpcw0318          1  COMPLETED      0:0 
8324120.2          bash              hpcw0318          1  COMPLETED      0:0 
8324120.3          bash              hpcw0318          1  COMPLETED      0:0 
8324120.4          bash              hpcw0318          1  COMPLETED      0:0 
8324120.5          bash              hpcw0318          1  COMPLETED      0:0 
8324120.6          bash              hpcw0318          1  COMPLETED      0:0 
8324120.7          bash              hpcw0318          1  COMPLETED      0:0 
8324120.8          bash              hpcw0318          1  COMPLETED      0:0 
8324120.9          bash              hpcw0318          1  COMPLETED      0:0 
8324120.10         bash              hpcw0318          1  COMPLETED      0:0 
8324120.11         bash              hpcw0318          1  COMPLETED      0:0 
8324120.12         bash              hpcw0318          1  COMPLETED      0:0 
8324120.13         bash              hpcw0318          1  COMPLETED      0:0 
8324120.14         bash              hpcw0318          1  COMPLETED      0:0 
8324120.15         bash              hpcw0318          1  COMPLETED      0:0 
8324120.16         bash              hpcw0318          1  COMPLETED      0:0 
8324120.17         bash              hpcw0318          1  COMPLETED      0:0 
8324120.18         bash              hpcw0318          1  COMPLETED      0:0 
8324120.19         bash              hpcw0318          1  COMPLETED      0:0 
8324120.20         bash              hpcw0318          1  COMPLETED      0:0 
8324120.21         bash              hpcw0318          1  COMPLETED      0:0 
8324120.22         bash              hpcw0318          1  COMPLETED      0:0 
8324120.23         bash              hpcw0318          1  COMPLETED      0:0 
8324120.24         bash              hpcw0318          1  COMPLETED      0:0 
8324120.25         bash              hpcw0318          1  COMPLETED      0:0 
8324120.26         bash              hpcw0318          1  COMPLETED      0:0 
8324120.27         bash              hpcw0318          1  COMPLETED      0:0 
8324120.28         bash              hpcw0318          1  COMPLETED      0:0 
8324120.29         bash              hpcw0318          1  COMPLETED      0:0 
8324120.30         bash              hpcw0318          1  COMPLETED      0:0 
8324120.31         bash              hpcw0318          1  COMPLETED      0:0 

The file parallel_joblog will contain a list of when each job ran and how long it took.

Seq     Host    Starttime       JobRuntime      Send    Receive Exitval Signal  Command
1       :       1512606492.971      10.213      0       73      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:1
12      :       1512606495.364      10.102      0       75      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:12
10      :       1512606494.912      12.105      0       75      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:10
9       :       1512606494.707      13.099      0       73      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:9
4       :       1512606493.604      15.101      0       73      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:4
6       :       1512606494.041      15.101      0       73      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:6
5       :       1512606493.815      17.105      0       73      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:5
2       :       1512606493.178      19.098      0       73      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:2
7       :       1512606494.282      18.109      0       73      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:7
3       :       1512606493.392      19.105      0       73      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:3
8       :       1512606494.497      18.105      0       73      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:8
11      :       1512606495.151      19.109      0       75      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:11
13      :       1512606503.189      12.106      0       75      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:13
15      :       1512606507.021      10.107      0       75      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:15
17      :       1512606508.710      11.105      0       75      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:17
14      :       1512606505.471      15.111      0       75      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:14
21      :       1512606512.498      12.109      0       75      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:21
22      :       1512606512.713      12.103      0       75      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:22
19      :       1512606510.925      14.108      0       75      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:19
16      :       1512606507.811      18.111      0       75      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:16
23      :       1512606512.941      15.105      0       75      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:23
18      :       1512606509.147      19.108      0       75      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:18
24      :       1512606514.263      15.111      0       75      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:24
20      :       1512606512.280      17.105      0       75      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:20
25      :       1512606515.299      15.106      0       75      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:25
27      :       1512606519.820      14.111      0       75      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:27
28      :       1512606520.587      14.109      0       75      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:28
26      :       1512606517.132      18.102      0       75      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:26
30      :       1512606524.821      13.109      0       75      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:30
32      :       1512606525.927      15.106      0       75      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:32
29      :       1512606524.611      18.110      0       75      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:29
31      :       1512606525.037      19.107      0       75      0       0       srun -n1 -N1 --exclusive /bin/bash ./runtask arg1:31

Key Points

  • GNU Parallel lets a single Slurm job start multiple subprocesses

  • This helps to use all the CPUs on a node effectively.