The solaris subcluster in maxwell¶

the solaris subcluster consists of a collection of not very new hardware. The nodes are however well suited to run less compute intense tasks requiring only a few cores, or a GPU with comparably low GPU memory like nvidia P100.

To make better use of the resources we added a couple of services:

jupyterhub: it's a regular jupyterhub like max-jhub, but you have to specify what resources to use.
REST API: see documentation for more details
Portal with a number of services to view the solaris subcluster and job utilization

Running non-demanding batch jobs¶

A separate slurm instance has been created to support single or few-core jobs. The slurm commands are almost identical to those described for standard full-node jobs, except that you need to specify the slurm instance:

max-wgse002:~$ sinfo -M solaris   # or sinfo --cluster=solaris
CLUSTER: solaris
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
solcpu*      up 7-00:00:00     12   idle max-wn[020-031]
solarm       up 7-00:00:00      2   idle max-arm[002-003]
solgpu       up 7-00:00:00     15   idle max-cmsg[001-008,010],max-wng[004-009]

The slurm instance - named solaris - contains 3 partitions with a handful of old nodes:

max-wgse002:~$ sinfo --cluster=solaris -o '%n %R %f'
CLUSTER: solaris
HOSTNAMES       PARTITION AVAIL_FEATURES
# CPU nodes
max-wn020       solcpu INTEL,V4,E5-2640,256G
max-wn021       solcpu INTEL,V4,E5-2640,256G
max-wn022       solcpu INTEL,V4,E5-2640,256G
max-wn023       solcpu INTEL,V4,E5-2640,256G
max-wn024       solcpu INTEL,V4,E5-2640,256G
max-wn025       solcpu INTEL,V4,E5-2640,256G
max-wn026       solcpu INTEL,V4,E5-2640,256G
max-wn027       solcpu INTEL,V4,E5-2640,256G
max-wn028       solcpu INTEL,V4,E5-2640,256G
max-wn029       solcpu INTEL,V4,E5-2640,256G
max-wn030       solcpu INTEL,V4,E5-2640,512G
max-wn031       solcpu INTEL,V4,E5-2640,512G
# ARM nodes
max-arm002      solarm ARM,ARMv8,Ampere,Altra,256G
max-arm003      solarm ARM,ARMv8,Ampere,Altra,256G
# GPU nodes
max-cmsg001     solgpu INTEL,V4,E5-2640,256G,GPU,GPUx1,P100
max-cmsg002     solgpu INTEL,V4,E5-2640,256G,GPU,GPUx1,P100
max-cmsg003     solgpu INTEL,V4,E5-2640,256G,GPU,GPUx1,P100
max-cmsg004     solgpu INTEL,V4,E5-2640,256G,GPU,GPUx1,P100
max-cmsg005     solgpu INTEL,V4,E5-2640,256G,GPU,GPUx1,P100
max-cmsg006     solgpu INTEL,V4,E5-2640,256G,GPU,GPUx1,P100
max-cmsg007     solgpu INTEL,V4,E5-2640,256G,GPU,GPUx1,P100
max-cmsg008     solgpu INTEL,V4,E5-2640,256G,GPU,GPUx1,P100
max-cmsg010     solgpu INTEL,V4,E5-2640,256G,GPU,GPUx1,P100
max-wng004      solgpu INTEL,V4,E5-2640,256G,GPU,GPUx1,P100
max-wng005      solgpu INTEL,V4,E5-2640,256G,GPU,GPUx1,P100
max-wng006      solgpu INTEL,V4,E5-2640,256G,GPU,GPUx1,P100
max-wng007      solgpu INTEL,V4,E5-2640,256G,GPU,GPUx1,P100
max-wng008      solgpu INTEL,V4,E5-2640,512G,GPU,GPUx2,P100
max-wng009      solgpu INTEL,V4,E5-2640,512G,GPU,GPUx2,P100

Job configuration¶

The solaris instance supports allocation of specific number of cores, and specification of memory. This means, that you have to set sensible limits. The node will otherwise either be poorly utilized, or your jobs terminated once exceeding the limits.

The default memory allocated to a job is 4GB.

Example 1¶

Allocate 4 cores:

#!/bin/bash
#SBATCH --cluster=solaris
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --time=0-00:10:00
unset LD_PRELOAD

np=$(nproc)
echo "Cores   available: $np"

srun -n $np hostname

# Output:
Cores   available: 4
max-wn008.desy.de
max-wn008.desy.de
max-wn008.desy.de
max-wn008.desy.de

Example 2¶

Allocate 4 cores and try to use 6 cores:

#!/bin/bash
#SBATCH --cluster=solaris
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=4G
#SBATCH --time=0-00:10:00
unset LD_PRELOAD

np=$(nproc)
echo "Cores   available: $np"

srun -n 6 hostname

# Output:
Cores   available: 4
srun: error: Unable to create step for job 51: More processors requested than permitted

Example 3¶

Allocate 4GB of memory and try to use 5GB

#!/bin/bash
#SBATCH --cluster=solaris
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G
#SBATCH --time=0-00:10:00
unset LD_PRELOAD

np=$(nproc)
echo "Cores   available: $np"

# try to allocate 5G of memory:
timeout 10 cat /dev/zero | head -c 5G | tail

# Output:
/var/spool/slurmd/job00050/slurm_script: line 17: 24886 Broken pipe             timeout 10 cat /dev/zero
     24887                       | head -c 5G
     24888 Killed                  | tail
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=50.batch. Some of your processes may have been killed by the cgroup out-of-memory handler.

# Note: the job state will in this case be OUT_OF_MEMORY

Job information¶

the squeue, sinfo, sacct ... commands work all as usual, just that you need to add --cluster=solaris. So to see your job it's

# squeue
squeue -u $USER -M solaris              # or 
squeue --user=$USER --cluster=solaris

# sacct
sacct -M solaris              # or 
sacct --cluster=solaris       # or
sacct -L                      # for both slurm instances (maxwell,solaris)

Running graphical applications¶

the display nodes in Maxwell impose some limitations on cores and memory, simply because esources are shared among many users. For some applications like fiji/imageJ for example the limitations cause a lot of problems. The GPU-nodes (P100 only!) in the solaris subcluster support hardware-accelerated graphical applications with virtualGL.

1. Allocate a GPU on a display node

salloc -p solgpu -M solaris --gres gpu:1  --ntasks 20 --mem 200G # it allocates a full node to avoid interference with other applications/users

2. Connect to the node with GPU

on max-display, or on your desktop, use vglconnect to login to the allocated node

vglconnect max-wng004 # or whatever node you get

3. Run your application

module load maxwell virtualgl
module load fiji # for example
vglrun fiji