random-samples¶
sbatch¶
Create a batch script my-script.sh like the following and submit with sbatch my-script.sh:
#!/bin/bash
#SBATCH --time=0-00:01:00
#SBATCH --nodes=1
#SBATCH --partition=maxcpu
#SBATCH --job-name=slurm-01
unset LD_PRELOAD # useful on max-display nodes, harmless on others
source /etc/profile.d/modules.sh # make the module command available
... # your actual job
That's the core information would you probably should also keep. Note: never add a #SBATCH after a regular command. It will be ignored like any other comment.
A simple example for a mathematica:
#!/bin/bash
#SBATCH --time=0-00:01:00
#SBATCH --nodes=1
#SBATCH --partition=allcpu
#SBATCH --job-name=mathematica
unset LD_PRELOAD
source /etc/profile.d/modules.sh
module purge
module load mathematica
export nprocs=$((`/usr/bin/nproc` / 2)) # we have hyperthreading enabled. nprocs==number of physical cores
math -noprompt -run '<<math-trivial.m'
# sample math-trivial.m:
tmp = Environment["nprocs"]
nprocs = FromDigits[tmp]
LaunchKernels[nprocs]
Do[Pause[1];f[i],{i,nprocs}] // AbsoluteTiming >> "math-trivial.out"
ParallelDo[Pause[1];f[i],{i,nprocs}] // AbsoluteTiming >>> "math-trivial.out"
Quit[]
salloc¶
salloc uses the same syntax as sbatch.
# request one node with a P100 GPU for 8hours in the allcpu partition:
salloc --nodes=1 --partition=allcpu --constraint=P100 --time=08:00:00
# start an interactive graphical matlab session on the allocated host.
ssh -t -Y $SLURM_JOB_NODELIST matlab_R2018a
# the allocation won't disappear when being idle. You have to terminate the session
exit
scancel¶
scancel 1234 # cancel job 1234
scancel -u $USER # cancel all my jobs
scancel -u $USER -t PENDING # cancel all my pending jobs
scancel --name myjob # cancel a named job
scancel 1234_3 # cancel an indexed job in a job array
sinfo¶
sinfo # basic list of partitions
sinfo -N -p allcpu # list all nodes and state in allcpu partition
sinfo -N -p petra4 -o "%10P %.6D %8c %8L %12l %8m %30f %N" # list all nodes with limits and features in petra4 partition
squeue¶
squeue # show all jobs
squeue -u $USER # show all jobs of user
squeue -u $USER -p upex -t PENDING # all pending jobs of user in upex partition
sacct¶
Provides accounting information. Never use it for time spans exceeding a month!
sacct -j 1628456 # accounting information for jobid
sacct -u $USER # todays jobs
# get detailed information about all my jobs since 2019-01-01 and grep for all that FAILED:
sacct -u $USER --format="partition,jobid,state,start,end,nodeList,CPUTime,MaxRSS" --starttime 2019-01-01 | grep FAILED
scontrol¶
Display information about currently running/pending jobs, configuration of partitions and nodes. Allows to alter job characteristics of pending jobs.
scontrol show job 12345 # show information about job 12345. Will show nothing after a job has finished.
scontrol show reservation # list current and future reservations
sontrol update jobid=12345 partition=allcpu # move pending job 12345 to partition allcpu
slurm¶
module load maxwell tools
slurm
#Show or watch job queue:
slurm [watch] queue # show own jobs
slurm [watch] q <user> # show user's jobs
slurm [watch] quick # show quick overview of own jobs
slurm [watch] shorter # sort and compact entire queue by job size
slurm [watch] short # sort and compact entire queue by priority
slurm [watch] full # show everything
slurm [w] [q|qq|ss|s|f] shorthands for above!
slurm qos # show job service classes
slurm top [queue|all] # show summary of active users
#Show detailed information about jobs:
slurm prio [all|short] # how priority components
slurm j|job <jobid> # how everything else
slurm steps <jobid> # show memory usage of running srun job steps
#Show usage and fair-share values from accounting database:
slurm h|history <time> # show jobs finished since, e.g. "1day" (default)
slurm shares
#Show nodes and resources in the cluster:
slurm p|partitions # all partitions
slurm n|nodes # all cluster nodes
slurm c|cpus # total cpu cores in use
slurm cpus <partition> # cores available to partition, allocated and free
slurm cpus jobs # cores/memory reserved by running jobs
slurm cpus queue # cores/memory required by pending jobs
slurm features # List features and GRES
slurm brief_features # List features with node counts
slurm matrix_features # List possible combinations of features with node counts
Ensuring minimum memory per core¶
The Maxwell cluster is not configured for consumable resource like memory. For an mpi-job running on heterogeneous hardware, you have to prepare your batch-script to tailor the number of cores used to the available memory for each node. A simple example:
#!/bin/bash
#SBATCH --partition=maxcpu
unset LD_PRELOAD
source /etc/profile.d/modules.sh
module purge
module load mpi/openmpi-x86_64
# set hostfile
HOSTFILE=/tmp/hosts.$SLURM_JOB_ID
rm -f $HOSTFILE
# set minimum 40GB per core
mem_per_core=$((40*1024))
# generate hostfile
for node in $(srun hostname -s | sort -u) ; do
mem=$(sinfo -n $node --noheader -o '%m')
cores=$(sinfo -n $node --noheader -o '%c')
slots=$(( $mem / $mem_per_core ))
slots=$(( $cores < $slots ? $cores : $slots ))
echo $node slots=$slots >> $HOSTFILE
done
# run ...
mpirun --hostfile $HOSTFILE
For a homogeneous set of nodes life becomes much easier
#!/bin/bash
#SBATCH --partition=allcpu,maxcpu
#SBATCH --constraint='[(EPYC&7402)|Gold-6240|Gold-6140]'
#SBATCH --nodes=8
unset LD_PRELOAD
source /etc/profile.d/modules.sh
module purge
module load mpi/openmpi-x86_64
# only use physical cores. Since nodes are all identical (constraint) this fits for all nodes
nprocs=$(( $(nproc) / 2 ))
# -N ensure $nprocs processes per node
mpirun -N $nprocs hostname | sort | uniq -c # should have same counts for each node