1606981 : Belle user on CMS scheduler¶

Created: 2026-06-08T09:13:50Z - current status: new¶

Summary: A large number of jobs (~1.7k out of several thousand) submitted by a user under the Belle experiment ended up in a held state. The reporter questions why this occurred despite assuming schedulers are allocated per experiment.

Anonymized Summary: Multiple jobs (approximately 1.7k) submitted by a [EXPERIMENT] user ([USERNAME]) entered a held state unexpectedly.

Possible Solution / Next Steps: 1. Check the hold_reason for the affected jobs using: bash condor_q -hold 2. Common causes include excessive memory usage (if >3x requested memory). Verify memory consumption via: bash condor_q <JOB_ID> -af MemoryUsage 3. Adjust memory requests if needed (set to >1/3 of peak usage): bash condor_qedit <JOB_ID> "RequestMemory = <VALUE>" 4. Release the jobs after fixes: bash condor_release <JOB_ID>

References: - NAF Docs: Job Goes on Hold - NAF Docs: Memory Overconsumption