Skip to content

Migration to EL9

as you will know the lifecycle of EL7 is ending by the end of june 2024 and we decided to use EL9 (more precisely RedHat EL 9) as a successor for the NAF infrastructural servers and workers likewise.

The old condor pool will go offline !

Shutdown is scheduled for tue 09-07 (9th of july)

The future condor pool is up and running and you can use it by choosing one of the available EL9 workgroupserver the easiest way to land on one of these WGS is using the DNS alias "naf-\<VO>-EL9"

e.g. "naf-atlas-el9.desy.de"

If you want to choose an EL9 WGS directly - this is a list as of 19-06-24 of hosts running EL9 & submitting into the new pool:

naf-alps-el9.desy.de

naf-astro21.desy.de

naf-atlas-el9.desy.de

naf-belle-el9.desy.de

naf-belle22.desy.de

naf-cms-el8.desy.de

naf-cms-el9.desy.de

naf-hone22.desy.de

naf-ilc-el9.desy.de

naf-luxe22.desy.de

naf-m21.desy.de

naf-m22.desy.de

naf-madmax21.desy.de

naf-madmax22.desy.de

naf-theo21.desy.de

naf-theo22.desy.de

naf-xfitter-el9.desy.de

naf-zeus22.desy.de

The timeline for migration is (timestamp 18-06-2024)

27th to 28th CW (this & next two week)

  • migration of some of the WGS to EL9 pointing to the new condor pool, these will automaticly appear under the alias above.

  • migration of some of the worker from the old pool to the new EL9 pool

29th CW

  • full shutdown of the rest of the pool, ressources will hopefully (re)appear with EL9 as OS 2 days after disappearing the latest, we currently scheduled the shutdown for tue 09-07 (9th of july).

Please migrate your code to EL9 and take advantage of the available ressources in EL9 already by utilizing the EL9 WGS's stated above !

See https://bird.desy.de/stats/day.html for progress and number of cores available and which OS

We have identified some challenges with the operating and scheduling systems when using EL9. Consequently, we had to implement a few temporary workarounds. This may present some difficulties in certain special cases. However, please rest assured that we are prepared and committed to addressing any additional issues that may arise, to ensure that your workload is managed as efficiently as possible.

**The earlier you get your personal migration to EL9 done the more we can support you in forehand and the less the migration of the pool will bother you as the ressources in the new EL9 pool should only grow and not go offline for any reason. Remember though that we do a lot of shifting under the surface and you will experience a change of schedulers maybe a couple of times, hence the visualisation of job queues may change.

Use console condor_q -global To see all of your jobs at any time ! **

Known issues

memory consumption monitoring

there is a known problem with false (higher) memory consumption reading which may lead to jobs being killed on otherwise empty machines as condor includes the cached memory pages of the kernel into the 'used-memory' calculation. A patch is on the way, in the meantime we force the kernel to flush pages every couple of minutes which decreases performance a little bit.

'materialize_max_idle' not working

the option 'materialize_max_idle' enables the submitter to only materialize a subset of a bigger job cluster in the queue and thus is a very useful tool to workaround the 5k jobs-in-queue limit. As you might guess from the subject line it is currently not working in EL9 - reasons are unknown and we are in contact with the developers to get it working again.

new jupyter notebook setup

not exactly an issue, the setup and distribution of jupyter notebooks has changed a bit, see here: Introduction to Jupyter Notebooks

CMS & EL8

as of now CMS will rely on EL8 for the current run, despite there are no EL8 ressource in the NAF EL8 workgroupserver for CMS are available. Jobs submitted from these WGS are automatically executed inside an EL8 singularity/apptainer image - namely we use:

/cvmfs/unpacked.cern.ch/registry.hub.docker.com/cmssw/cc8:amd64

For testing the usual means do work (ssh-to-job/interactive job). If you prefer different images or need additional options see: Apptainer support in NAF