1591262 : Machines on condor with more CPUs¶
Created: 2026-04-11T07:30:52Z - current status: new¶
Anonymized Summary:
A user is submitting a computationally intensive job (a long and heavy fit) via HTCondor on the NAF infrastructure. The current submission script requests a large amount of memory (RequestMemory = 99999) and runtime (+RequestRuntime = 50000000), but only uses the default of 1 CPU core. The user inquires whether machines with more than 1 CPU are available to expedite the job.
Solution & Recommendations:¶
-
Request Additional CPUs: The user can explicitly request multiple CPU cores by adding:
plaintext RequestCPU = <N> # Replace <N> with the desired number of cores (e.g., 2, 4, 8)Example for 4 cores:plaintext RequestCPU = 4Note: Requesting more cores will increase the job's "cost" in terms of user priority, potentially leading to longer queue times (see Job Requirements). -
Adjust Memory per Core (If Needed): The current
RequestMemory = 99999(99.999 GB) is extremely high. If the job is parallelized across multiple cores, ensure the memory request is per core (e.g., 25 GB per core for 4 cores). If the total memory is fixed, reduce the request accordingly to avoid over-allocation. -
Benchmark-Based Node Selection (Optional): If the job benefits from high-performance nodes, the user can prioritize machines with higher benchmark scores (e.g.,
kflopsormips) using theRankexpression. Example:plaintext Rank = kflopsWarning: This may further limit available slots (see Selecting Nodes). -
Optimize Runtime: The requested runtime (
50000000seconds ≈ 578 days) is unrealistic. Condor jobs typically have hard limits (e.g., 72 hours for non-lite jobs). The user should: - Estimate the actual runtime and request a realistic value (e.g.,
+RequestRuntime = 86400for 24 hours). - Split the job into smaller chunks if possible (e.g., using
queue <N>with different input files).
Revised Submission Script Example:¶
executable = xfitter-HERAPDFMTop.sub
transfer_executable = True
universe = vanilla
output = xfitter.out
error = xfifter.err
log = xfifter.log
should_transfer_files = IF_NEEDED
when_to_transfer_output = ON_EXIT
environment = "CLUSTER=$(Cluster) PROCESS_ID=$(Process) Ireplica=$ENV(pdfid)"
RequestMemory = 25000 # 25 GB (adjust based on needs)
RequestCPU = 4 # 4 cores
+RequestRuntime = 86400 # 24 hours (adjust to actual runtime)
queue 1
Key Considerations:¶
- Priority Impact: Requesting more resources (CPUs/memory) will reduce the user’s priority, increasing queue times for future jobs.
- Job Splitting: If the fit can be parallelized (e.g., by parameter space), consider submitting multiple smaller jobs (see Foreach-Style Submission).
- Testing: Test with smaller resource requests first to gauge performance.