GSI Forum
GSI Helmholtzzentrum für Schwerionenforschung

Home » PANDA » PandaRoot » General » Problem with submitting jobs on prometheus cluster
Problem with submitting jobs on prometheus cluster [message #15569] Mon, 14 October 2013 14:14 Go to next message
Klaus Götzen is currently offline  Klaus Götzen
Messages: 293
Registered: June 2006
Location: GSI
first-grade participant
From: *gsi.de
Hi,


I have some strange problem, maybe somebody get's an idea about it.

I try to submit a bunch of PandaROOT simulation jobs (single track box generator) on the prometheus cluster at GSI, and that perfectly worked last week. Today I tried to do this again, but for almost all jobs (some run) no events are generated, and in the output I can spot the error message

...
[INFO   ] Simulation RunID: 1381751574   
At line 44 of file gbase/gmail.F (unit = 10, file = 'gphysi.dat')
Fortran runtime error: Cannot write to file opened for READ
...


I don't know whether that is the key issue, but no events are generated afterwards. On the interactive machine this error does not appear. So I don't know at all, whether this is an error I produced by some misconfiguration, or being a problem of the cluster nodes.

Does somebody by chance know what's going on? Is it possible to switch off this file output to gphysi.dat, or is this an important file to be produced?


Best regards and thanks,
Klaus
Re: Problem with submitting jobs on prometheus cluster [message #15571 is a reply to message #15569] Mon, 14 October 2013 16:07 Go to previous messageGo to next message
Florian Uhlig is currently offline  Florian Uhlig
Messages: 424
Registered: May 2007
first-grade participant
From: SURFnet-CIDR-90-145-invalid
Hi Klaus,

this is your fault. Very Happy

OK, now I will try to give you a meaningful answer to your problem. If you don't specify a working directory GridEngine will execute the job in the default working directory. If I remember correctly this is /tmp.
If you run a Geant3 simulation the file gphys.dat is created in the working directory. If you don't clean up after you finished the simulation the file stays in this working directory. If you end up to run on a machine where this file was created by another user you can't overwrite the file and Geant3 will crash with the error message you have seen.

What I do in my GridEngine macros is to create a workdirectory and run the simulation there.

workdir=/tmp/$username/$JOB_ID.$SGE_TASK_ID
mkdir -p $workdir
cd $workdir

At the end of the script I remove the working directory after I have moved all output to the final destination.

Ciao

Florian
Re: Problem with submitting jobs on prometheus cluster [message #15572 is a reply to message #15571] Mon, 14 October 2013 16:31 Go to previous messageGo to next message
StefanoSpataro is currently offline  StefanoSpataro
Messages: 2736
Registered: June 2005
Location: Torino
first-grade participant

From: SURFnet-CIDR-90-145-invalid
Is it possible to have some tutorial on how to submit properly jobs on that farm?
Re: Problem with submitting jobs on prometheus cluster [message #15575 is a reply to message #15571] Mon, 14 October 2013 18:08 Go to previous message
Klaus Götzen is currently offline  Klaus Götzen
Messages: 293
Registered: June 2006
Location: GSI
first-grade participant
From: *gsi.de
Hi Florian,


thanks a lot for your answer! After some investigations together with Ralf we also found the problem - it was exactly as you said.

Btw there exists already a unique TMPDIR, where I just changed into with 'cd $TMPDIR' right at the beginning. This seems to work as well.

Anyway, I now know about the issue.


Best,
Klaus
Previous Topic: POST-DOCTORAL FELLOWSHIPS IN EXPERIMENTAL PHYSICS - Year 2013/2014 - Italy
Next Topic: STTHits with errors
Goto Forum:
  


Current Time: Fri Nov 29 05:11:14 CET 2024

Total time taken to generate the page: 0.00681 seconds