1592122 : Files on pnfs not accessible from condor workers¶
Created: 2026-04-14T08:47:32Z - current status: new¶
Anonymized Summary:
A user from the [EXPERIMENT_GROUP] collaboration reports intermittent issues accessing files on pnfs (specifically in /pnfs/desy.de/[EXPERIMENT_GROUP]/tier2/store/...) from HTCondor workers. The problem:
- Does not occur when accessing files locally from NAF login nodes.
- Only affects jobs submitted via HTCondor, causing analysis code to fail.
- Other members of the same experiment group observe similar issues.
- Onset coincides with a scheduled downtime on April 8.
Suggested Solution/Next Steps:¶
- Verify Kerberos Authentication on Workers The issue may stem from expired or missing Kerberos tickets on Condor worker nodes. Users should:
- Check if jobs include a valid Kerberos ticket (
kinitbefore submission). - Ensure the job script renews credentials if needed (e.g., via
aklogorkrenew). - Example command to list tickets:
bash klist -
If missing, request a new ticket with:
bash kinit -
Check for Experiment-Specific Infrastructure Issues Since the problem is observed by multiple users in the [EXPERIMENT_GROUP] group, it may be related to:
- dCache/PNFS access permissions for Condor workers.
- Network or storage backend issues post-downtime.
-
Action: Contact the dedicated [EXPERIMENT_GROUP] support line:
naf-[EXPERIMENT_GROUP]-support@desy.de(Replace[EXPERIMENT_GROUP]with the actual experiment name.) -
Debugging Steps for Condor Jobs
- Log Files: Check Condor job logs for errors (e.g.,
HoldReasonorMemoryUsage).bash condor_q -held [USERNAME] condor_q [JOB_ID] -af HoldReason -
Test Job: Submit a minimal job to isolate the issue (e.g., a script that only lists files in
/pnfs/...). -
Temporary Workaround If the issue is intermittent, retry failed jobs or use local scratch space (
/scratchor/tmp) as a fallback.
Sources:¶
- NAF Support Contacts (Experiment-specific support lines).
- Condor Submit Errors (KRB Tickets) (Kerberos authentication for Condor jobs).
- Job Requirements and Failures (Debugging Condor job holds).