1594810 : HTCondor held jobs stuck - schedd SECMAN:2007 error

Created: 2026-04-22T10:26:04Z - current status: new

Anonymized Summary:

A user reports four jobs stuck in the hold state on the HTCondor scheduler [SCHEDULER_HOST] (e.g., bird-htc-sched22.desy.de). The user is unable to remove or manage these jobs due to a persistent SECMAN:2007: Failed to end classad message error when executing any condor_* commands. The user confirms valid Kerberos authentication but cannot proceed further.

Job IDs (anonymized): - [JOB_ID_1] - [JOB_ID_2] - [JOB_ID_3] - [JOB_ID_4]


Core Issue:

  1. Jobs in Hold State: The jobs are stuck in "hold" and cannot be released or removed via standard commands (condor_release, condor_rm).
  2. Scheduler Overload: The SECMAN:2007 error suggests the scheduler is unresponsive, likely due to:
  3. A high volume of faulty job submissions (e.g., incorrect paths, logging issues).
  4. Excessive polling of the scheduler (e.g., via condor_q in scripts or watch).
  5. A denial-of-service-like state where the scheduler is overwhelmed.

Suggested Solution/Next Steps:

  1. Immediate Workaround:
  2. Avoid further condor_q commands to reduce scheduler load.
  3. Contact NAF/HTCondor admins to:

    • Manually remove the stuck jobs from the scheduler’s queue.
    • Check the scheduler’s status (e.g., CPU/memory usage, log files) for underlying issues.
  4. Preventive Measures:

  5. Review job submission scripts for typos in paths (executable/logging/output).
  6. Limit job submissions until the scheduler recovers.
  7. Use condor_q -hold (if the scheduler responds) to identify the hold reason for future jobs.

  8. If the Issue Persists:

  9. The scheduler may need a restart (admin-only action).
  10. Monitor the NAF documentation or status page for updates on scheduler health.

References: