ITG Unix Support
>    
     |  List directory  |  History  |  Similar  |  Print version  

HPC > IonMan Cluster > Aborting a job on IonMan

Aborting a job on IonMan

First you need to find the job number associated with the job you wish to kill. The easiest way to do this is using the jobstatus utility, but you may also just run "condor_q -global" to get roughly the same result. Say for instance you decide that you would like to halt sampling on the 0823i sample. When you run jobstatus, you see:

  jobid |  host |     user |   duration | sample | cmd                                     
-----------------------------------------------------------------------------
  n24-8 | diana | mplasenc |   07:29:03 | 0824G  | runmascot mak2.inp ms-mak2.inp
 n20-45 | diana | mplasenc |   08:07:24 | 0823i  | runmascot mak2.inp ms-mak2.inp
 n1-245 |   n20 | mplasenc |   08:56:52 | 0823i  | runjob (wait n20-46)          
 n1-252 |   n24 | mplasenc |   08:20:36 | 0824G  | runjob (wait n24-9)           

You can see that 0823i has two jobs associated with it: n1-252 and n20-45. You need to connect to both n1 and n20 and run condor_rm on the job number. Here is an example using the clrun.

n1% condor_rm 252
n1% clrun "condor_rm 45" 20

or you can simply ssh to node 20 by hand, and then exit

n1% ssh n20
Welcome!
n20% condor_rm 45
n20% exit

Once the jobs have been aborted, your sample will move into the /jobs/rejected folder.

 

Reference http://wiki.chem.indiana.edu/HPC/AbortingAJobOnIonMan
Rights rw-rw-r--   tstrombe   IonMan

Prev. Viewing the status of submitted jobs   IBM Support Information for IonMan Next