1579117 : Your held Jobs¶
Created: 2026-02-24T07:28:48Z - current status: new¶
Here is the anonymized and summarized version of the reported issue:
Summary of the Issue¶
A large number of jobs ([~800]) were placed on hold in the National Analysis Facility (NAF) due to exceeding their reserved runtime limit. The jobs were automatically terminated by the system (HTCondor) because they ran longer than the allocated time (default: 3 hours for "lite-class" jobs).
Key Details¶
- Hold Reason: "Job runtime longer than reserved"
- Status: All affected jobs are in a "held" state (not running, idle, or completed).
- Action Required: The jobs must either be removed or edited (to request a longer runtime) and then released to resume execution.
Suggested Solution¶
-
Check Held Jobs: Use the following command to list all held jobs and confirm the hold reason:
bash condor_q -held [USERNAME] -
Edit Job Requirements (if runtime extension is needed):
- Update the job’s runtime requirement (
RequestRuntimeorbidein NAF terminology) to a higher value (e.g., 6 hours) using:bash condor_qedit [JOB.ID] "RequestRuntime = 21600" # 6 hours in seconds -
Alternatively, specify a custom runtime class (e.g.,
bide) in the submit file for future jobs. -
Release Jobs: After editing, release the jobs to restart them with the new requirements:
bash condor_release [JOB.ID] -
Remove Jobs (if no longer needed): Delete held jobs entirely with:
bash condor_rm [JOB.ID]
Preventive Measures¶
- For Future Jobs: Explicitly request a longer runtime in the submit file if the job is expected to exceed the default 3-hour limit.
- Monitor Job Performance: Use
condor_qto track runtime and memory usage proactively.