Individual job status and return codes¶
The overall handling of job returns of htcondor maybe a bit general and sometimes it can be tempting to figure out which jobs actually did succeed in a way they were supposed to and which ones did not.
An elegant way around this lack is to use 'condor_chirp' to create and alter classadds of the job during runtime and/or create custom status files or entries in the job logfiles.
For all examples you need to enable this option in the submit file: +WantIOProxy = true Creating custom entries in the job logfile
Inside your job you can use for example.:
**
/usr/libexec/condor/condor_chirp ulog "Hello World - I am your condor job"
Leading to an entry in the job logfile:
[chbeyer@htc-it02]~/htcondor/testjobs% cat /afs/desy.de/user/c/chbeyer/log_7455691_0.log
<snip>
...
008 (7455691.000.000) 08/20 13:15:54 Hello World - I am your condor job
...
005 (7455691.000.000) 08/20 13:15:54 Job terminated.
Writing job states and/or return states after job finish into a custom file in your $HOME
Inside you job you can use for ex.:
In this case 'wa' means 'write' 'append' which also means all of your jobs can potentially write their status or return state in one file that you can monitor usint 'tail -f' for example.
[chbeyer@htc-it02]~/htcondor/testjobs% cat /afs/desy.de/user/c/chbeyer/my_logfile.txt
7456398 I am feeling fine
7456763 I am feeling fine
7456764 I am feeling fine
7456765 I am feeling fine
7456768 I am feeling fine
7456766 I am feeling fine
7456767 I am feeling fine
7456769 I am feeling fine
7456761 I am feeling bad
7456760 I am feeling bad
7456762 I am feeling bad
7456773 I am feeling bad
7456770 I am feeling bad
7456771 I am feeling bad
7456772 I am feeling bad
Altering and adding classadds of a running job from inside the job¶
You can use 'condor_chirp' to inject additional class_adds to the job or alter existing classadds with the current state of your job from inside the job. the charming thing about this is that you can then use the custom classadd to find or sort jobs using 'condor_q' while the jobs are running or 'condor_history' once the jobs are done.
At anytime inside your job you can then alter the job-class-add of the running job for ex with state messages like this by adding a classadd that gets created on the fly, I named it 'MyJobState' & 'MyJobReturn' but anything goes, just be sure to not overwrite an existing htcondor classadd of course:
my_job.sh :
/usr/libexec/condor/condor_chirp set_job_attr 'MyJobState' '"Starting"'
sleep 120 #do something here
/usr/libexec/condor/condor_chirp set_job_attr 'MyJobState' '"1/10 Done"'
sleep 120 # do some more here
/usr/libexec/condor/condor_chirp set_job_attr 'MyJobState' '"2/10 Done"'
sleep 120 # you got it ...
/usr/libexec/condor/condor_chirp set_job_attr 'MyJobState' '"3/10 Done"'
sleep 120
/usr/libexec/condor/condor_chirp set_job_attr 'MyJobState' '"4/10 Done"'
sleep 120
/usr/libexec/condor/condor_chirp set_job_attr 'MyJobState' '"5/10 Done"'
sleep 120
/usr/libexec/condor/condor_chirp set_job_attr 'MyJobState' '"6/10 Done"'
sleep 120
/usr/libexec/condor/condor_chirp set_job_attr 'MyJobState' '"7/10 Done"'
sleep 120
/usr/libexec/condor/condor_chirp set_job_attr 'MyJobState' '"8/10 Done"'
sleep 120
/usr/libexec/condor/condor_chirp set_job_attr 'MyJobState' '"9/10 Done"'
sleep 120
/usr/libexec/condor/condor_chirp set_job_attr 'MyJobState' '"Done"'
/usr/libexec/condor/condor_chirp set_job_attr 'MyJobReturn' '"Good"'
Now you can use 'condor_q' to check on your job states during job-runtime (use 'condor_q -l' to check what else you want to list like submit time etc.) :
[chbeyer@htc-it02]~/htcondor/testjobs% condor_q -af ClusterID -af MyJobState
7453792 5/10 Done
7453806 4/10 Done
7453810 4/10 Done
7453815 4/10 Done
7453819 4/10 Done
7453823 4/10 Done
7453827 4/10 Done
7453831 3/10 Done
7453837 3/10 Done
7453843 3/10 Done
7453847 3/10 Done
7453851 3/10 Done
7453855 3/10 Done
7453860 2/10 Done
7453864 2/10 Done
7453868 2/10 Done
7453872 2/10 Done
7453876 2/10 Done
7453878 2/10 Done
7453880 2/10 Done
7453883 1/10 Done
7453885 1/10 Done
7453887 1/10 Done
7453889 1/10 Done
7453891 1/10 Done
7453893 1/10 Done
7453895 Starting
7453897 Starting
7453899 Starting
7453901 Starting
7453904 Starting
You can also list jobs that do have a certain state of course:
[chbeyer@htc-it02]~/htcondor/testjobs% condor_q -af ClusterID -constraint 'MyJobState == "3/10 Done"'
7453860
7453864
7453868
7453872
7453876
7453880
In my example above I put the final return code of my 'job' in the classadd 'MyJobReturn' that I can use with condor_history after the job has finished:
[chbeyer@htc-it02]~/htcondor/testjobs% condor_history -af ClusterID -af 'MyJobReturn'
7453806 Good
7453810 Good
7454011 False
7454010 False
7454013 False
7454012 False
7454009 False
7454008 False
7453999 Not so good
7454001 Not so good
7454004 Not so good
7454007 Not so good
7454003 Not so good
7454005 Not so good
7454002 Not so good
7454006 Not so good
7454000 Not so good
7453792 Good
See the manual page of condor_chirp for more informations: https://htcondor.readthedocs.io/en/latest/man-pages/condor_chirp.html