Home » PANDA » PANDA - Computing » Grid and Infrastructure » Upcoming DC
Upcoming DC [message #6203] |
Wed, 02 April 2008 22:01 |
Johan Messchendorp
Messages: 693 Registered: April 2007 Location: University of Groningen
|
first-grade participant |
From: *xs4all.nl
|
|
Hi all,
Let me start a forum discussion concerning the data production test on the Grid. From the email correspondences, I would conclude the following (please correct me if I am wrong):
* Suggestion by Kilian to have REGULAR production tests on the GRID is welcomed (test performances, stability, capacity, monitoring tools, etc etc).
* Proposal is to have it at least twice per year, preferably connected to a Grid workshop.
* Kick-off of such DS now! Don't wait for something better to come.
Now the critical issue: which jobs? My proposal:
1) the default QA macros as used in the nightly tests (Dashboard). At the moment there is not a lot of them, but certainly this will be extended during course of time. Actually, these macros are meant as rigorous tests (simulation, reconstruction, analysis) of the framework, e.g. why not exploit these macros on the Grid sites as well. It would also garuantee that the software developments are tested in synchronization with the Grid infrastructure. There will also be a "master" macro which calls the individual QA macros. All these macros will be part of the PandaRoot software.
2) software compilation and building. As we already discussed via email, the preference for the moment is to employ the PandaRoot software via a friendly installation script. Mohammad and Florian are already doing excellent work on this front and working hard on making the script as grid-friendly as possible. A regular compilation and building on all the sites would reveal missing configuration issues, but would also give regular feedback to Mohammad and Florian. One should realize that sites might also change their installation during course of time, which would lead to crashing jobs for which the problem is difficult to trace back. A regular compilation test would help to identify and accomodate for these kind of changes.
.... what else?
Johan.
Johan Messchendorp
University of Groningen/KVI
Zernikelaan 25
NL-9747 AA Groningen
The Netherlands
tel. +31-503633558
fax +31-503634003
[Updated on: Thu, 03 April 2008 00:25] Report message to a moderator
|
|
|
|
Re: Data Challenge [message #6208 is a reply to message #6204] |
Thu, 03 April 2008 12:34 |
|
Dear all,
I would like to thank everybody for participating in this discussion. There are several that showed up and I will address herein:
1) Dates of the DCs
2) Nightly builds on Grid
3) Proposed test jobs/macros
4) Storage of the outputs
Here below is my opinion on these issues. Please use the forum to reply so that we have an organized thread. If you have no access, register or ask someone else to post for you.
1) The Grid Data Challenges will happen at dates to be decided independently and to ensure this objectivity I propose that the dates be set by our Production Manager (PM), Paul Buehler, after some consultation with both the Grid Coordinator (Dan) and Software Development Coordinator (Johan). The next date is April 17 (no changes accepted) but after that I hope the proposed PM scheme will be applied.
2) One of the major points to be understood about Grid is that it is not a testbench. It is a "massive computations infrastructure", designed for large-scale, stadardized jobs. Although nightly builds on several platforms is an extremely useful tool for developers to spot early compilation problems, it would be a misuse of the grid. The software installed on the grid is supposed to be a stable version, which has been tested already on the platforms existent on the grid sites. Testing and feedback would happen at installation time. Florian explained this issue better than me.
3) The basic plan for the next data challenge is to count jobs and produce statistics like job success rate, job site distribution, time per 1000 jobs. I propose the tests to done both with a generic job and a PandaRoot job in order to decouple the various requirements. Having a macro that simulates some tracks in the EMC or producing some real physics is almost irrelevant for this data challenge. However, I enthusistically embrace the idea of running some physics from which someone can collect, verify the results and gain an extra benefit. If you have such macro, let's use it!
My initial plan is this:
- 10x100 subjobs generic (site availability)
- 1x1000 subjobs generic (broker optimization)
- 1x1000 PandaROOT macro #1
- 1x1000 PandaROOT macro #2
- 1x1000 PandaROOT macro #1 or #2 with all output to Glasgow SE
- 1x1000 PandaROOT macro #1 or #2 with output to local SEs
Please feel free to add to this list and let's discuss the benefits of these tests.
Package testing can not be part of the data challenge because of time constraints. A check of the software installation is part of the preparations taken individually by the site admins.
4) The outputs of the DC test jobs will go physically to local and central SEs and that is part of the challenge. In the file catalogue, the output can be collected as you see fit, in case you would like to use the results later.
Cheers,
Dan
Dr. Dan PROTOPOPESCU
Department of Physics & Astronomy,
University of Glasgow,
Glasgow, G12 8QQ, Scotland, UK
Tel/Fax: +44 141 330-5531
Mobile: +44 794 046-3355
|
|
|
|
|
|
Re: Upcoming DC [message #6231 is a reply to message #6203] |
Fri, 04 April 2008 13:49 |
|
Is there a new version of the software that you would like distributed and installed prior to the next DC ? Software compilation testing can
not be part of the DC itself because it is a asyncronous process and would take too long (at least for the upcoming DC).
Let us also talk about the macros you would like to propose for the upcoming DC and what data would you like to keep and where.
|
|
|
|
|
Re: Upcoming DC [message #6237 is a reply to message #6236] |
Fri, 04 April 2008 15:49 |
|
We could have both: a generic job as well as PandaRoot macros producing meaningful data. This way we can practically decouple these components for the evaluation stage. I would encourage further discussions about both options.
|
|
|
|
Re: Upcoming DC [message #6239 is a reply to message #6238] |
Fri, 04 April 2008 16:40 |
|
Wouldn't the forum be preferable because then we have a written plan of action ? There is not too much actually, is it ?
|
|
|
|
Re: Data Challenge [message #6241 is a reply to message #6215] |
Fri, 04 April 2008 16:55 |
Johan Messchendorp
Messages: 693 Registered: April 2007 Location: University of Groningen
|
first-grade participant |
From: *xs4all.nl
|
|
Maybe we should turn the question around: how much data do we want to produce for the DC? We can easily tune the macros for that (i.e. decide on how many events to throw for example)
[Updated on: Fri, 04 April 2008 16:56] Report message to a moderator
|
|
|
Re: Upcoming DC [message #6245 is a reply to message #6203] |
Fri, 04 April 2008 17:21 |
|
My initial plan as outlined earlier would be this:
- 10x100 subjobs generic (site availability)
- 1x1000 subjobs generic (job broker evaluation)
- 1x1000 PandaROOT macro #1
- 1x1000 PandaROOT macro #2
- 1x1000 PandaROOT macro #1 or #2 with all output to Glasgow SE
- 1x1000 PandaROOT macro #1 or #2 with output to local SEs
Package testing not be part of the data challenge because of time constraints.
I think the first two list items coincide with what Paul proposed but we should refine our choice of 'generic' jobs.
Could Johan and Soeren decide on the two macros ? Let's say one on detector simualtions and one conatining physics. We already tested the fast simulations macro (Johan), and Soeren proposed including some rho analysis.
Please feel free to add to the above list and we'll decide next week on the final set.
What do you think about such a start ?
[Updated on: Fri, 04 April 2008 17:52] Report message to a moderator
|
|
|
Re: Upcoming DC [message #6247 is a reply to message #6245] |
Fri, 04 April 2008 17:43 |
Johan Messchendorp
Messages: 693 Registered: April 2007 Location: University of Groningen
|
first-grade participant |
From: *xs4all.nl
|
|
No problem. Soeren, do you have any preference concerning the physics channel we should run? Here would be my suggestion: we could run for instance eta_c channel to multi-photon channels using fast simulations and rho, and run at the same time a full MC simulation with the same eventgenerator output to make a benchmark/validation for the fast simulations (G3 versus G4 versus fast simulations). Alternatively (or in addition), we could consider to run a channel with many charged pions in combination with conformal mapping code (MVD+TPC).
Johan.
[Updated on: Fri, 04 April 2008 17:44] Report message to a moderator
|
|
|
|
|
|
|
Re: Upcoming DC [message #6253 is a reply to message #6203] |
Sun, 06 April 2008 15:28 |
Jens Sören Lange
Messages: 193 Registered: June 2005
|
first-grade participant |
From: *web.vodafone.de
|
|
Hi all,
Stefano and I thought a bit again about the question,
which macros to use.
And the key question is:
************************************************************
which are our most time-consuming steps in the simu or reco?
************************************************************
And, actually, here the fast sim is the smallest problem,
because - obviously - it is fast by definition. So I changed my opinion a bit.
Therefore I would like to propose three different macro groups
(in other words, our "bottlenecks")
1.) dpm
2.) UrQMD
3.) tpc reco and stt reco
(maybe - if we keep the DC data somewhere - we could actually use it for the long-planned tpc/stt comparison?)
So, concerning 1.)
macro/run/run_sim1.C
with all detectors switched on
and then change the generator to DPM, see
http://panda-wiki.gsi.de/cgi-bin/view/Computing/Dpm
-> "Simulation inside PandaRoot"
and then generate billions of events
Note: unfortunately I don't know anymore how to set the beam energy in
DPM. I have to ask Stefano tomorrow.
So, concerning 2.)
macro/run/run_sim1.C
with all detectors switched on
and then change the generator to UrQMD, see
http://panda-wiki.gsi.de/cgi-bin/view/Computing/UrqmdSmm
here the heavy targets (Au, Pb) are most useful,
because most time-consuming.
anti-proton beam momenta 3.00 and 4.05 GeV
(these are needed for the J/Psi-in-nucleus measurement).
Actually, the GRID would be very useful here to generate as many events as possible!
So, concerning 3.)
macro/tpc/tutorial
runMC.C runDigi.C runReco.C
the svn version of these macros have some difficulties right now
(I just tried again and I have e.g. undefined symbol GeaneTrackRep),
see also
http://forum.gsi.de/index.php?t=msg&th=1802&rid=0&S=dfa54395 2d09c2dca876d4fb1bde7c98#msg_6124
e.g. one has to comment out "UseGeane()".
I hope that we can fix it until the DC
(I know that Dipak has a version which works).
macro/stt
run.C rundigi.C runreco.C
they work fine.
Here I would propose just to use the box generator
for muons with
pT=30,40,50,...100 MeV/c
pT=100,200,300,...,1000 MeV/c
pT=1,2,3,...,7.5 GeV/c
and uniform polar angle.
(the highest point at 7.5 GeV/c is for the Drell-Yan measurement).
What do you think?
cheers, Soeren
|
|
|
Re: Upcoming DC [message #6261 is a reply to message #6253] |
Mon, 07 April 2008 12:20 |
|
I am very glad to see that we have interesting stuff to run. And yes, we can run all these macros. I propose that we prepare as follows:
1) make sure the macros run (on your desktop) -> Soeren, Stefano ?
2) make sure we have the latest software tarballs (including all the updated software) -> Florian, Johan, Mohammad ? + all SA
3) make an estimate of the number of jobs/events you wish to run -> Soeren ?
With regard to (2), we will need the support of all site admins (SA), to install and check the installation of the new package version once it is made available. We have to keep in touch.
The location where the output goes is set in the JDL: what to keep, where to register it, where to save it physically.
The macros themselves can be updated in the last moment, then added to the alien catalogue etc.
The related wiki topic is: http://nuclear.gla.ac.uk/twiki/bin/view.pl/Main/SubmitExample.
Please document your contributions to the upcoming DC in: http://panda-wiki.gsi.de/cgi-bin/viewauth/Computing/DataChallenge1
[Updated on: Mon, 07 April 2008 12:23] Report message to a moderator
|
|
|
|
|
|
|
|
|
|
Re: Upcoming DC [message #6280 is a reply to message #6279] |
Mon, 07 April 2008 17:08 |
|
During the DC, we will basically run what will be provided to us by April 15, plus some generic benchmark jobs.
Of course, if you have some physics to run, it can be run outside the DC. The Grid is available in general and everyone is welcome!
|
|
|
|
Re: Upcoming DC [message #6283 is a reply to message #6281] |
Mon, 07 April 2008 18:49 |
Johan Messchendorp
Messages: 693 Registered: April 2007 Location: University of Groningen
|
first-grade participant |
From: *KVI.nl
|
|
Hi all,
To run DPM is not so difficult, since we anyway run from a (bash) script. The jdl scripts can take arguments, which can be passed to the shell scripts. Actually, last time we run the DPM generator in combination with fast simulations.
Just make sure that also the macros use input parameters (random number see, input filename, outputfilename, energy,..), then it is very trivial (see example below).
(ps, we might have to think a little bit more about the random number seed. I am not sure whether one can take any number for that)
Johan
-- example shell scripts called by JDL --
#!/bin/bash
#
# $1 unique number for seed
# $2 momentum
# $3 number of events
#
#
echo "This is the fast simulation test production"
echo "provided by Johan Messchendorp"
echo "Starting the job"
export RANSEED=`expr 1202677345 + $1`
cat <<EOF >input.$1
$RANSEED
$2
1
$3
EOF
cat input.$1
DPMGen < input.$1
root -b -q "simfast_jgm.C(\"Background-micro.root\",0,$3,\"simfast_jgm.root\")" ||
exit 11
echo "----------------------------------------------------------------------- "
echo "From wrapper script: job finished successfully"
[Updated on: Mon, 07 April 2008 20:31] Report message to a moderator
|
|
|
Re: Upcoming DC [message #6285 is a reply to message #6203] |
Mon, 07 April 2008 22:53 |
Johan Messchendorp
Messages: 693 Registered: April 2007 Location: University of Groningen
|
first-grade participant |
From: *xs4all.nl
|
|
Dear all,
I installed on the Grid (which means via PackMan)
pandaroot rev2432 (7/4/08)
dpmgen rev2432 (which means derived and build from the pandaroot rev2432)
against
cbmsoft 16/01/08 (geant4.9.1, geant4_vmc_r331, geant3.1.9, vgm 3.00, root5.18, pluto412, pythia6, clhep2.0.3_1).
If we want to run Geant4 simulations in the DC, it is probably advisable to upgrade as well the cbmsoft release to the most recent one of march'08. Otherwise, I would say, lets stick to this for the upcoming DC event, since cbmsoft_r16/01/08 already compiled successfully on many/most of the sites. Please note, that I only compiled the above pandaroot-related packages on the KVI site. I expect, however, no problems in the compilation of PandaRoot on the other sites. But that I leave up to the site administrators to test:
packman install pandaroot::rev2432
packman install dpmgen::rev2432
(probably, "packman install dpmgen::rev2432" will do the job since it depends on pandaroot)
After installation, one can test it by (existing scripts from last runs)
(dpmgen, 5.5 GeV/c, 10x1000 events, in combi with fsim)
submit /panda/user/p/pbarprod/jdl/simfast_jgm.jdl 999 5.5 1000
with the output written to
/panda/user/p/pbarprod/jgm/fast/run999/1-10
Johan.
[Updated on: Mon, 07 April 2008 23:06] Report message to a moderator
|
|
|
|
|
|
|
|
Re: Upcoming DC [message #6322 is a reply to message #6308] |
Wed, 09 April 2008 16:07 |
Jens Sören Lange
Messages: 193 Registered: June 2005
|
first-grade participant |
From: *physik.uni-giessen.de
|
|
Hi all,
the stt and tpc macros for the DC are now checked in.
pandaroot/macro/dc1
NOTE that infile and outfile are still fixed in the macros.
stt
===
root -b runsim.C"(nEvents,pT)"
root -b rundigi.C
root -b runreco.C
params 1175k+11.7k/event (params really seem to be run dependand)
sim 240k+6.5k/event
digi 10k+10.6k/event
reco 0.3k+0.2k/event
differences to usual svn macros:
o field maps instead of constant field
o nEvents and pT are option parsed in sim
o nEvents is zero (=read all) in digi and reco
Proposal:
pT=30,40,50,...100 MeV/c
pT=100,200,300,...,1000 MeV/c
pT=1,2,3,...,7.5 GeV/c
10,000,000 events each
tpc
===
we will use Stefano's new macros
root -b run_sim_tpcmvd.C"(nEvents,pT)"
root -b run_rectrack_tpcmvd.C
params 43k (params seem to be fixed)
sim 46k+17.4k/event
digi+reco zero kB+39.2k/event
differences to usual svn macros:
o nEvents and pT are option parsed in sim
Proposal:
pT=30,40,50,...100 MeV/c
pT=100,200,300,...,1000 MeV/c
pT=1,2,3,...,7.5 GeV/c
10,000,000 events each
dpm
===
not macros, but here Johan's bash scripts will be used.
however, for Kilian's calculation of disk space:
250 bytes/event
nEvents and beam momenta for dpm will be proposed tomorrow.
cheers, Soeren
|
|
|
Re: Upcoming DC [message #6323 is a reply to message #6307] |
Wed, 09 April 2008 16:21 |
|
Dear Johan, Kilian and all,
Thank you for all the work put into this. The outputs that we want to keep for later analysis could be directed to one of big SE (I have at least 1T in Glasgow).
We should not worry about the CPU resources; one of the goals of this DC is to see how much is actually available. It will depend a lot on the site admins and the way they allocate resources.
I understand that everyone should wait until Monday to install the packages on their sites. I hope the site admins follow this forum.
I will be travelling on April 14 and 15, and be back on the 16th.
|
|
|
Goto Forum:
Current Time: Sat Nov 09 00:48:01 CET 2024
Total time taken to generate the page: 0.01015 seconds
|