Simple Linux Utility Resource Manager (SLURM)

4 minute read

Published:

Slurm is a powerful job scheduler and resource manager designed for clusters. It efficiently allocates resources such as CPU cores, memory, GPUs, and nodes to user-submitted jobs, ensuring optimal utilization and management of the cluster’s capabilities. Users can submit jobs, request specific resources, and monitor job statuses through Slurm’s commands and interfaces.

  • Introduction to slurm
  • Connecting to the cluster
  • sinfo command: (Nodes,and Partition information)
  • Interaction with slurm
    • batch jobs (job scripts)
    • interactive jobs
  • Other slurm commands (managing jobs)
  • Examples
  • Slurm job arrays

Slurm

Slurm is a job scheduler and resource manager. It is used to manage resources on a cluster and to schedule jobs on the cluster.

Slurm manages:

  • Time
  • CPU cores
  • Memory
  • GPUs
  • Nodes
  • Jobs

A user asks slurm for certain resources and provides a work to do. Slurm will schedule and allocate those resources and report the job status back to the user.

Connecting to the cluster

You can either connect to the cluster using ssh in a terminal but this limits what you can do in terms of writting code. ```bash ssh username@ozerlabs ```

sinfo command

#Get information about partitions and its nodes and resources 
# PARTITION   AVAIL  TIMELIMIT  NODES  STATE NODELIST
sinfo 

## all nodes & partitions
sinfo --all

## all nodes & partitions with more details
sinfo --all --long

#specific partition
sinfo --partition=main

#specific node
sinfo --node=nova101

#useful info
sinfo -Nel

sinfo -a "%P %D %N %G"

sinfo -h -N $partition $hostlist $statelist -O "NodeList,Partition,CPUs,Memory,Gres,GresUsed,Cluster,User"


#get more help
sinfo --help

man sinfo #useful

Interaction with slurm

Slurm can be interacted with in two ways:

  • sbatch command
  • interactive jobs

sbatch command

The sbatch command is used to submit a job to slurm. The job is submitted as a job script.

sbatch [options] myJobScript.sh

In the job script we describe the resources that we need using directives. refer to Script generator for a quick start.

#!/bin/bash

#SBATCH --job-name=jobName
#SBATCH --account=users
#SBATCH --partition=main
#SBATCH --nodes=1
#SBATCH --partition=main
#SBATCH --ntasks-per-node=1
##SBATCH --ntasks=4
#SBATCH --cpus-per-task=1
#SBATCH --mem=1G
#SBATCH --time=00:01:00
#SBATCH --partition=short
#SBATCH --output=jobName.out
#SBATCH --error=jobName.err

# load modules
module load python/3.7.3
# environment variables
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

# run program
srun python3 myProgram.py

#program with arguments
srun python3 myProgram.py arg1 arg2 arg3

salloc

allocates resources in an interactive session (a shell).

#salloc
salloc -n 1 -t 3:00:00 --mem-per-cpu 3G --pty bash

#sinteractive
sinteractive -N 1 -n 1 --nodelist=nodename --gres=gpu:1 -J int_jobs_name

Other slurm commands (managing jobs)

#show all jobs
squeue 

#user specific
squeue -u username

#jobs in a specific partition
squeue -p main

#jobs in a specific state
squeue -t running

## cancelling jobs
scancel jobID

#cancel all jobs
scancel -u username

#cancel job is a specific state

scancel -u ndigande -t running

## get help 
man scancel
man squeue 

## sacct 
#show account information
sacct 

#user specific
sacct -u username

#in specific time range
sacct -u username -S 2020-01-01 -E 2020-01-31


## admin info

## scontrol
scontrol show nodes|partition|job

#sacctmgr
sacctmgr list user

#sreport
sreport cluster AccountUtilizationByUser start=2020-01-01 end=2020-01-31

Examples

#!/bin/bash

#SBATCH --job-name=singlecpu
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1

# Your script goes here
sleep 30
echo "hello"
#!/bin/bash

#SBATCH --job-name=singlecputasks
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=1

# Your script goes here
srun --ntasks=1 echo "I'm task 1"
srun --ntasks=1 echo "I'm task 2"
#!/bin/bash

#SBATCH --job-name=multithreaded
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8

# Your script goes here
mycommand --threads 8
#!/bin/bash

#SBATCH --job-name=multithreadedtasks
#SBATCH --nodes=4
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=4

# Your script goes here
srun --ntasks=1 mycommand1 --threads 4
srun --ntasks=1 mycommand2 --threads 4
srun --ntasks=1 mycommand3 --threads 4
srun --ntasks=1 mycommand4 --threads 4
#!/bin/bash

#SBATCH --job-name=simplempi
#SBATCH --ntasks=16

# Your script goes here
mpirun myscript
#!/bin/bash

#SBATCH --job-name=nodempi
#SBATCH --ntasks=16
#SBATCH --ntasks-per-node=8

# Your script goes here
mpirun myscript

Most Used Commands

SINFO


sinfo

sinfo -Nel

sinfo -a "%P %D %N %G"

sinfo -h -N $partition $hostlist $statelist -O "NodeList,Partition,CPUs,Memory,Gres,GresUsed"


squeue

squeue

saccount

sacct

jobs

scancel #####

references