The command `srun` is a means of allocating jobs using the Slurm scheduler. It has two modes: job allocation and job step allocation. Here is [the `srun` documentation for Slurm 18.08.9](https://slurm.schedmd.com/archive/slurm-18.08.9/srun.html).
-**For Job Allocation**: When used at the terminal, `srun` schedules a new job and blocks terminal execution. This mode is rarely useful as, if the terminal process dies or is killed (like if your internet connection is interrupted), then the job is terminated. It can be useful for quick, ad-hoc tests using the `--pty` flag to get an interactive terminal in a job context. We recommend against using `srun` for job allocation and instead recommend using OOD HPC Desktop jobs for increased robustness.
-**For Job Step Allocation**: When used as a command in an `sbatch` script, `srun` starts a job step with the requested resources and blocks execution of the remainder of the script. If you need to avoid blocking the script, prefix your `srun` command with `nohup` or use `&` as a suffix to run `srun` in the background and continue script execution.
## Relevant `srun` Flag Reference
Relevant flags for `srun` command within this context:
-`--distribution`:
- How to distribute tasks on allocated hardware.
- Defaults to `block`.
- Not needed for this use-case.
-`--exclusive`: Makes resources allocated exclusive to this `srun` task only, no other `srun` tasks can use these resources while this `srun` job step is running.
-`--nodes`: Total number of nodes to allocate for this `srun` job step.
-`--ntasks`:
- Number of tasks to allocate for each node.
- Total tasks for this `srun` job step is `--nodes * --ntasks`.
-`--cpus-per-task`:
- Number of cpus to allocate for each task.
- Total cpus for this `srun` job step is `--nodes * --ntasks * --cpus-per-task`.
-`--cpu-bind`:
- Takes one or two options, separated by a comma. The first option is either `verbose` or `quiet` (default). The second option is a binding type. The default binding type is sufficient for most jobs.
- Use `--cpu-bind=verbose` for helpful logging information about which memory is allocated to this `srun` job step.
-`--mem`: how much memory per node to allocate
-`--mem-per-cpu`: how much memory per cpu to allocate
-`--mem-bind`:
- Takes one or two options, separated by a comma. The first option is either `verbose` or `quiet` (default). The second option is a binding type.
- Does not appear to have any effect on our system.
-`--export=ALL`: Export all variables to srun environment. This should be the default behavior, but you may need to use this flag explicitly.
## Putting It All Together
See [`script.sh`](script.sh) for a "Hello World!" style example of how to use `srun` job steps in a for loop within `sbatch` to run many similar tasks in parallel. We typically recommend using the `sbatch --array` flag for this type of execution, but there are niche cases where knowledge of this style of parallel execution is helpful, such as with Quantum Espresso's `uspex` command.
The `sbatch` job runs across three nodes, using one cpu per node. The steps will all be started together via the for loop due to the use of `nohup` to allow execution to continue after each `srun` job step is started. Note the use of `$SLURM_*` environment to pass `sbatch` options to the `srun` job steps for consistency.
The script should produce an output that looks something like the following. Note that the order of execution is not sequential because the `srun` job steps are executed in parallel.