README
The command srun
is a means of allocating jobs using the Slurm scheduler. It has two modes: job allocation and job step allocation. Here is the srun
documentation for Slurm 18.08.9.
-
For Job Allocation: When used at the terminal,
srun
schedules a new job and blocks terminal execution. This mode is rarely useful as, if the terminal process dies or is killed (like if your internet connection is interrupted), then the job is terminated. It can be useful for quick, ad-hoc tests using the--pty
flag to get an interactive terminal in a job context. We recommend against usingsrun
for job allocation and instead recommend using OOD HPC Desktop jobs for increased robustness. -
For Job Step Allocation: When used as a command in an
sbatch
script,srun
starts a job step with the requested resources and blocks execution of the remainder of the script. If you need to avoid blocking the script, prefix yoursrun
command withnohup
or use&
as a suffix to runsrun
in the background and continue script execution.
srun
Flag Reference
Relevant Relevant flags for srun
command within this context:
-
--distribution
:- How to distribute tasks on allocated hardware.
- Defaults to
block
. - Not needed for this use-case.
-
--exclusive
: Makes resources allocated exclusive to thissrun
task only, no othersrun
tasks can use these resources while thissrun
job step is running. -
--nodes
: Total number of nodes to allocate for thissrun
job step. -
--ntasks
:- Number of tasks to allocate for each node.
- Total tasks for this
srun
job step is--nodes * --ntasks
.
-
--cpus-per-task
:- Number of cpus to allocate for each task.
- Total cpus for this
srun
job step is--nodes * --ntasks * --cpus-per-task
.
-
--cpu-bind
:- Takes one or two options, separated by a comma. The first option is either
verbose
orquiet
(default). The second option is a binding type. The default binding type is sufficient for most jobs. - Use
--cpu-bind=verbose
for helpful logging information about which memory is allocated to thissrun
job step.
- Takes one or two options, separated by a comma. The first option is either
-
--mem
: how much memory per node to allocate -
--mem-per-cpu
: how much memory per cpu to allocate -
--mem-bind
:- Takes one or two options, separated by a comma. The first option is either
verbose
orquiet
(default). The second option is a binding type. - Does not appear to have any effect on our system.
- Takes one or two options, separated by a comma. The first option is either
-
--export=ALL
: Export all variables to srun environment. This should be the default behavior, but you may need to use this flag explicitly.
Putting It All Together
See script.sh
for a "Hello World!" style example of how to use srun
job steps in a for loop within sbatch
to run many similar tasks in parallel. We typically recommend using the sbatch --array
flag for this type of execution, but there are niche cases where knowledge of this style of parallel execution is helpful, such as with Quantum Espresso's uspex
command.
The sbatch
job runs across three nodes, using one cpu per node. The steps will all be started together via the for loop due to the use of nohup
to allow execution to continue after each srun
job step is started. Note the use of $SLURM_*
environment to pass sbatch
options to the srun
job steps for consistency.
The script should produce an output that looks something like the following. Note that the order of execution is not sequential because the srun
job steps are executed in parallel.
cpu-bind=NULL - c0220, task 0 0 [38051]: mask 0x200
c0220
cpu-bind=NULL - c0222, task 0 0 [34410]: mask 0x40
cpu-bind=NULL - c0221, task 0 0 [92378]: mask 0x80
c0222
c0221