GSI Forum
GSI Helmholtzzentrum für Schwerionenforschung

Home » Alice » Alice Computing » Meeting on Thursday, November 1, 2007
Meeting on Thursday, November 1, 2007 [message #5337] Tue, 30 October 2007 12:50 Go to next message
Silvia Masciocchi is currently offline  Silvia Masciocchi
Messages: 162
Registered: May 2006
first-grade participant
From: *gsi.de
Meeting at 10:00, in room 4.140

Preliminary agenda:
(priorities to be assigned, for the moment only a simple list)

1) shifts to write protocols of the meetings

2) status new XRD cluster (Kilian, Victor)
3) AliXRDPROOFtoolkit (Marian)
4) Next PROOF tutorial

5) GRID users' jobs at GSI:
- Ana's analysis task: compilation problems (Kilian)
- Silvia's experience during last week
- Anna's monitoring

6) status file transfers (Kilian)
7) first speed test results (Misha)

8) copy of the AliEn database to GSI
9) installation of AliRoot v4-07-Rev-01

================================================================
================================================================

Additional information:

Point 5: GRID users' jobs at GSI
=======================

Silvia
------
See timing at http://pcalimonitor.cern.ch/Correlations?Assigned-Started.enabled=true&a mp;Inserting-Assigned.enabled=true&Running-Saving.enabled=true&S aved-Done.enabled=true&Saving-Saved.enabled=true&SiteBase=GSI&am p;Started-Running.enabled=true&USER=sma&edial_vcs2.remember_pw=0 &err=0&imgsize=800x550&interval.max=0&interval.min=60480 0000&log=0&page=timings%2Fperuser&sum=0)

Point 8: Database on AliEn
===================

(information from Alberto Colla)
For AliRoot v4-07-Release and its revisions, use:
/alice/simulation/2007/PDC07_v4-07-Rev-01/ (full geometry)
and
/alice/simulation/2007/PDC07_v4-07-Rev-01_PartialGeom/ (partial TRD and TOF)

---------------------------------------

You can drain a local copy of the database with AliRoot (with access to AliEn):

AliCDBManager* man = AliCDBManager::Instance(); man->SetDefaultStorage("alien://folder=/alice/simulation/2007/PDC07_v4-07-Rev-01/Ideal/CDB/");
man->SetRun(0);

man->SetDrain("local:///d/alice08/sma/soft64/CDB2/PDC07_v4-07-Rev-01/Ideal/CDB");
AliCDBStorage* sto = man->GetDefaultStorage();
sto->GetAll("*",0);

----------------------------------------

Now it is also possible to access the DB on AliEn only once at the beginning of the task, cache a local temporary copy and use that for any further access during that session:

man->SetDefaultStorage(" alien://folder=/alice/simulation/2007/PDC07_v4-07-Rev-01/Ideal/CDB/?cach eFold=/tmp/DBCache")

[Updated on: Wed, 31 October 2007 13:32]

Report message to a moderator

Marian's transparencies about the AliXRDPROOFtoolkit [message #5347 is a reply to message #5337] Thu, 01 November 2007 12:06 Go to previous messageGo to next message
Silvia Masciocchi is currently offline  Silvia Masciocchi
Messages: 162
Registered: May 2006
first-grade participant
From: *gsi.de
No Message Body
Re: Meeting on Thursday, November 1, 2007 - minutes [message #5348 is a reply to message #5337] Thu, 01 November 2007 14:27 Go to previous messageGo to next message
Kilian Schwarz is currently offline  Kilian Schwarz
Messages: 91
Registered: June 2004
Location: GSI, Darmstadt
continuous participant
From: *gsi.de
1) shifts to write protocols of the meetings
It has been agreed that the task will be rotated according to the alphabetical order of the family name. The start will be done by Kilian SCHWARZ.
Next meetings should start at 9:30 am because Peter Malzacher has to leave at 10:45.

order of people to write protocol:
Schwarz Kilian,
Zynovyev Misha,
Andronic Anton,
Ivanov Marian,
Kreshuk Anna,
Malzacher Peter,
Masciocchi Silvia,
Miskowiec Dariusz,
Penso Victor,
Preuss Carsten,
(people who know in advance that they cannot be there or cannot
write minutes must find a replacement).

2) status new XRD/PROOF cluster
- new PROOF test cluster is up and working
- temporary setup: redirector lxgrid2 and 4 WNs (lxb284, lxb285, lxb286, lxb287). (only 2 machines can be used by outside users)
- functionality (also with large files and AliRoot) has been tested by Ana and Marian.
- Tree->Draw does not work with current head version of AliRoot. It did work with v4-06-Release. The problem appears locally and in PROOF and is no PROOF related problem.
AliEn SE:
- plain xrdcp to new cluster works
Plan:
- start functioning AliEn SE on new cluster
- move more and more of the new ALICE machines to the cluster
- When farm management is understood, GSIAF will be shut down and all machines will be moved to new cluster. 1.5 days will be needed for the switch. An announcement will be done beforehand.
- start new cluster including all machines as GSIAF

- Marian wants to test memory checker on test PROOF cluster.

3.) AliXRDPROOFtoolkit
- currently a standalone macro in $ALICE_ROOT/TPC/macros, usage directly from ROOT. The functionality is described in .cxx file.
- this tool should be used ASAP. The presentation can be sent to Federico. Spelling mistakes have to be corrected beforehand.
- for most implemented functions data servers are accessed via lsrun, which would not work on file servers. For file servers a solution still has to be found. Eventually a neutral account with limited rights could be created, or proxy or alien-token might be used for authentification.
- includes: make chains, create file lists, check file inconsistency, copy directories, find file functionality
- todo: user interface, more functionality, documentation, convince users
- motivation: users need to produce their own data on xrootd. Not everybody in ALICE, though should have the right to stage datasets.
- deleting files should happen via web page where files to be deleted are published. Every week, e.g. Sunday night a cronjob should run which actually removes the published files.
- order on fileservers should be maintained via predefined directory structure, but everybody can do anything under any name.

4.) next PROOF tutorial.
- last November week is is PROOF 2007 @ CERN.
- GSI PROOF tutorial shall be held on the week of November 19. Silvia reserved the IT Schulungsraum for Friday, November 23. The tutorial shall be from 10 am to 5 pm with lunch break in between. Participants: around 10 persons.

5.) GRID users jobs at GSI
- Ana's analysis task: compilation problems
- info sent to CERN experts for clarification, so far no answer
- so for not enough time for detailed investigation
- will be done with high priority, is on TODO list.
- Silvia experienced 100% job failure because of NFS time outs
- Anna's monitoring will be put to central place

TO BE INVESTIGATED:
- what do "priorities" in AliEn mean ? / share of aliprod/user jobs
- from INSERTED to ASSIGNED up to 200 minutes. Before 50 minutes. Why from ASSIGNED to RUNNING 25 minutes ? SAVING to SAVED 20 minutes ?

6.) status of file transfers
- Kilian knows how to do AliEn transfers from CERN but to be able to mirror TPC test data he needs to be daquser.
- lots of files transferred successfully
- Nov-1: 2394 transfers, 99.71% efficiency
- Oct-31: 5698 transfers, 97.20% efficiency
- Oct-30: 5588 transfers, 99.45% efficiency
- Oct-29: 2539 transfers, 95.52% efficiency
- Oct-28: 187 failed transfers, 0% efficiency
- Oct-27: 210 failed transfers, 0.94% efficiency
- Oct-26: 189 failed transfers, 5.97% efficiency
reason: migration problem at CERN. After the upgrade of CASTOR2 and changes of pools it was not possible to get the correct configuration.
- Nov-1: GSI vobox crashed because CERN initiated too many parallel transfers.
- maximum transfer rate: 0.5 MB/s (ML)
- average incoming traffic during last week: 35 KB/s (ML)
- network group measure at the same time incoming data rate of 70 Mb/s, all routed through the vobox (bottleneck !!!)
- file list of transfered PDC07 data still to be created

7.) first speed test results
- see presentation.
- method: 1-8 AliRoot jobs analysed data in parallel by reading either from local disk on lxb281 or from fileserver lxfs51 via xrootd: 1 mio pp events have been read and processed using AliRoot v4-05-Rev-03.
- result: local data on local disk: 1 mio events in 43 minutes. (after 500 kevents processing goes faster)
- 8 parallel jobs on local disk: 7:12 hours. Already 2 jobs need more time !!!
- question: why does train of tasks need more time than single task ?
- main result: I/O speed roughly the same from local disk and from fileserver
- maximum speed about 500 to 600 events / s corresponding to 15-20 MB/s, all jobs together. For clarification: 1 job manages to analyse 500 to 600 events /s and 8 jobs, too, but all 8 together. The rate per individual job goes down accordingly
- remark: old D-Grid machines have no raids, new ALICE 16 GB batch farm machines have raid.
- remark: raid does not seem to scale at all. Eventually files too small and no hardware caching. Only 1 PROOF worker/machine ?
Future AliRoot versions may solve this problem in storing several ESDs per file. Test should be repeated with larger files, e.g. AliESDfiends.
- should we move to new data ? 2 mio events with new analysis manager ?

8.) copy of AliEn database to GSI
DONE.
stored in v4-07-Rev-01(32bit)/Full/
/Ideal/
/Residual/
/PartialGeom/Full
/Ideal
/Residual

9.) installation of AliRoot v4-07-Rev-01
32bit: DONE
64bit: DONE (currently problem with large binary output)


- remark: in December run zero supression is needed
- remark 2 (PBM): with the current GSI budget the planned 1 Gb bandwidth for GSI might be problematic.

[Updated on: Thu, 08 November 2007 13:50]

Report message to a moderator

Preliminary results of speed tests [message #5349 is a reply to message #5337] Thu, 01 November 2007 17:32 Go to previous message
Misha Zynovyev is currently offline  Misha Zynovyev
Messages: 7
Registered: March 2007
occasional visitor

From: *gsi.de
First preliminary results of speed tests. Some numbers can change over time.
One thing about the results in these slides should be explicitly noted:

One job on local disk data runs at 500 events/sec.
Eight jobs TOGETHER run at 400 events/sec (NOT 400 each)

[Updated on: Thu, 01 November 2007 18:24]

Report message to a moderator

Previous Topic: ALICE data at GSI
Next Topic: Authors of minutes
Goto Forum:
  


Current Time: Tue Apr 16 11:46:04 CEST 2024

Total time taken to generate the page: 0.01045 seconds