1580173 : Available naf cores decreasing?¶
Created: 2026-02-27T08:03:33Z - current status: new¶
**
Summary of the Issue¶
A user reports observing a decline in the number of available healthy slots in the NAF’s daily statistics, with the count dropping below 4,000. They suspect this may be due to a single user consuming 3,000–4,000 cores for an extended period and request an investigation.
Anonymized & Neutralized Report¶
A user noticed a reduction in available healthy slots in the NAF’s daily statistics, with the count falling below 4,000. They inquired whether this could be attributed to a single user occupying 3,000–4,000 cores for an extended duration and requested an administrative review.
Possible Explanation & Next Steps¶
-
Quota & Fair-Share Mechanism The NAF employs a fair-share scheduling system with group/user quotas (e.g.,
BIRD_[experiment].lite/bide). If a user or group exceeds their weighted usage, their priority decreases, but they may still occupy resources if no higher-priority jobs are pending. Thecondor_userprio.desy/gpucommands can verify current allocations (see Quotas and Priorities). -
Surplus Quota Exploitation "Standard jobs" (1 core, 2 GB RAM, 3h runtime) can bypass group quotas by utilizing surplus resources. If many such jobs are running, they may temporarily reduce available slots (see Compute Resources).
-
Migration to EL9 The ongoing EL9 migration (since July 2024) has reduced the total pool size. Some slots may be offline for upgrades, and EL9-specific issues (e.g., memory monitoring) could affect slot availability (see Migration Timeline).
-
Recommended Actions
- Check Current Usage:
Run
condor_status -availableto confirm slot availability. Usecondor_userprio.desyto identify high-usage users/groups. - Review Daily Statistics: The BIRD-day.html page tracks slot trends. A sudden drop may indicate a single user’s jobs, while gradual declines could reflect migration or maintenance.
- Contact Support:
If the issue persists, open a ticket via
naf-helpdesk@desy.dewith:- Output of
condor_q -global(for job distribution). - Screenshots of the daily statistics showing the decline.
- Output of