1581382 : Your held jobs

Created: 2026-03-04T08:16:42Z - current status: new

Here is the anonymized and summarized version of the reported issue:


Summary of the Issue

A user has 874 held jobs in the National Analysis Facility (NAF) HTCondor system. The jobs were placed on hold because they exceeded the default runtime limit (3 hours for "lite-class" jobs). The hold reason for each job is listed as: "Job runtime longer than reserved".


Suggested Solution

  1. Check Job Details To verify the runtime and other resource usage for a specific held job, run: bash condor_q [JOB_ID] -af HoldReason RequestRuntime RemoteWallClockTime (Replace [JOB_ID] with an actual job ID from the list.)

  2. Options to Resolve the Issue

  3. Option 1: Delete and Resubmit Delete the held jobs and resubmit them with updated runtime requirements (e.g., using RequestRuntime in the submit file). bash condor_rm [JOB_ID] # Delete a single job condor_rm -constraint 'HoldReason == "Job runtime longer than reserved"' # Delete all held jobs with this reason
  4. Option 2: Edit and Release Jobs Adjust the runtime limit for held jobs and release them: bash condor_qedit [JOB_ID] "RequestRuntime = [NEW_RUNTIME_IN_SECONDS]" condor_release [JOB_ID] (Example: RequestRuntime = 14400 for 4 hours.)

  5. Prevent Future Issues

  6. Test Jobs First: Run a small batch of jobs to verify runtime/memory requirements before submitting large numbers.
  7. Use Dedicated Runtime Classes: For jobs exceeding 3 hours, specify a longer runtime class (e.g., bide) in the submit file.

References