GSI Forum
GSI Helmholtzzentrum für Schwerionenforschung

Home » PANDA » PANDA - Computing » Grid and Infrastructure » Kill some zombie jobs
Kill some zombie jobs [message #12442] Wed, 03 August 2011 09:18 Go to next message
donghee is currently offline  donghee
Messages: 385
Registered: January 2009
Location: Germnay
first-grade participant
From: *cern.ch
Dear Gridka users,

I couldn't make kill my submit jobs.
Yesterday I sent some jobs, and went quite smooth for few first jobs.

Then next sets of submit jobs are not going to finish.
That are not completed, and cannot be killed.
From the alien I have done already
>masterJob IDs kill
But in MonALISA those jobs are still activating.
How can it be? There are no indication like zombie in the monitoring.



Owner is dkang and here are corresponding problematic IDs.

1204074
1203993
1203972
1203971
1203970
1203949
1203948
1203927
1203886
1203885
1203864


Thank you for your helps.
Re: Kill some zombie jobs [message #12443 is a reply to message #12442] Wed, 03 August 2011 09:28 Go to previous messageGo to next message
StefanoSpataro is currently offline  StefanoSpataro
Messages: 2736
Registered: June 2005
Location: Torino
first-grade participant

From: *7-87-r.retail.telecomitalia.it
Try with:

masterJob # -status ZOMBIE resubmit

And you will resend them.
Re: Kill some zombie jobs [message #12444 is a reply to message #12443] Wed, 03 August 2011 10:00 Go to previous messageGo to next message
donghee is currently offline  donghee
Messages: 385
Registered: January 2009
Location: Germnay
first-grade participant
From: *cern.ch
Hi Stefano,

The real problem is that the masterJob IDs was already disappeared after killing them from alien.

masterJob # -status ZOMBIE resubmit
or
masterJob # -status ZOMBIE kill

doesn't work any more due to missing IDs.
thus Alien couldn't find them.

I try to see the trace for this jobs

Quote:


[pgdb2.gla.ac.uk:3307] /panda/user/d/dkang/jdl/ > ps trace 1204074 all
001 Tue Aug 2 21:47:51 2011 [state ]: Job 1204074 inserted from dkang@dkang-laptop.cern.ch
002 Tue Aug 2 21:48:15 2011 [submit ]: Subjob submitted: 1204094
003 Tue Aug 2 21:48:15 2011 [state ]: Job state transition from SPLITTING to SPLIT
004 Tue Aug 2 21:48:15 2011 [submit ]: Subjob submitted: 1204092
005 Tue Aug 2 21:48:15 2011 [submit ]: Subjob submitted: 1204093
006 Tue Aug 2 21:48:15 2011 [submit ]: Subjob submitted: 1204091
007 Tue Aug 2 21:48:15 2011 [submit ]: Subjob submitted: 1204090
008 Tue Aug 2 21:48:15 2011 [submit ]: Subjob submitted: 1204089
009 Tue Aug 2 21:48:15 2011 [submit ]: Subjob submitted: 1204088
010 Tue Aug 2 21:48:15 2011 [submit ]: Subjob submitted: 1204087
011 Tue Aug 2 21:48:15 2011 [submit ]: Subjob submitted: 1204084
012 Tue Aug 2 21:48:15 2011 [submit ]: Subjob submitted: 1204085
013 Tue Aug 2 21:48:15 2011 [submit ]: Subjob submitted: 1204086
014 Tue Aug 2 21:48:15 2011 [submit ]: Subjob submitted: 1204083
015 Tue Aug 2 21:48:15 2011 [submit ]: Subjob submitted: 1204082
016 Tue Aug 2 21:48:15 2011 [submit ]: Subjob submitted: 1204081
017 Tue Aug 2 21:48:15 2011 [submit ]: Subjob submitted: 1204080
018 Tue Aug 2 21:48:15 2011 [submit ]: Subjob submitted: 1204079
019 Tue Aug 2 21:48:15 2011 [submit ]: Subjob submitted: 1204075
020 Tue Aug 2 21:48:15 2011 [submit ]: Subjob submitted: 1204076
021 Tue Aug 2 21:48:15 2011 [submit ]: Subjob submitted: 1204077
022 Tue Aug 2 21:48:15 2011 [submit ]: Subjob submitted: 1204078
023 Tue Aug 2 21:48:15 2011 [state ]: Job state transition to SPLITTING
024 Tue Aug 2 21:48:15 2011 [trace ]: Using the inputcollection LF:/panda/user/d/dkang/collections/list-run931-6
025 Tue Aug 2 23:44:22 2011 [state ]: Job state transition to KILLED |=| procinfotime: 1312321462 spyurl: finished: 1312321462





Jobs are already marked "finished", but MonALIAS said still in the status of "Activate".
Same problem was happened already few days ago, at that time I killed jobs without any problem, but now I cannot do it.

I need to contact with administrator of Gridka to kill them.

Best wishes,
Donghee






Re: Kill some zombie jobs [message #12445 is a reply to message #12444] Wed, 03 August 2011 10:42 Go to previous messageGo to next message
donghee is currently offline  donghee
Messages: 385
Registered: January 2009
Location: Germnay
first-grade participant
From: *cern.ch
Dear Gridka users,

Actually user has a permission only 60 subjobs.
Yesterday I have seen that submit can go over the top value, i.e., more 60 Jobs. It was happended for me.

Normally alien decide to reject submission for outranging of 60.
But a large number of CPU was assigned to me at yesterday, after that I got the crash for rest of my subjobs without success or completed.

Cheers,
Donghee
Re: Kill some zombie jobs [message #12447 is a reply to message #12445] Wed, 03 August 2011 13:07 Go to previous message
donghee is currently offline  donghee
Messages: 385
Registered: January 2009
Location: Germnay
first-grade participant
From: *cern.ch
Thanks for killing my unkown Zombies !

Previous Topic: Gridka jod denied
Next Topic: Meaning of ERROR_SV
Goto Forum:
  


Current Time: Thu Nov 28 07:17:27 CET 2024

Total time taken to generate the page: 0.00688 seconds