GSI Forum: Alice Computing » Meeting on Thursday, April 26, 2007

Home » Alice » Alice Computing » Meeting on Thursday, April 26, 2007

Show: Today's Messages :: Polls :: Message Navigator

Re: Meeting on Thursday, April 26, 2007 [message #4172 is a reply to message #4170]

Thu, 26 April 2007 06:39

Kilian Schwarz
Messages: 91
Registered: June 2004
Location: GSI, Darmstadt

continuous participant

From: 124.6.169*

Dear Silvia and all,

I would like to comment your points:

1. from the summary table one can see that GSI dedicated 100 CPUs to Grid computing, which we did not fullfill so far. But we will certainly come close to that within this year. Especially when you compute the number of 100 CPUs into kSi2k of the time when this number has been pledged.

2. you reported nothing about failure rates at GSI.
what you reported is:
- integrated CPU time at GSI compared to other centres.
This value compares more or less with the percentual site share, which has been 4 to 5% at Muenster and 0.5% at GSI so far.
The reason for this is that we have been running continuously 3 Grid jobs at GSI for quite some time (and most of them successfully and without failures !!!)
since the high mem queue did not provide more than 3 machines in the batch farm and the other batch machines were not able to deal with the memory consumption of ALICE jobs. Since we put the D-Grid machines into production we run 10 jobs in average in parallel, which is already a significant improvement. We also tried already 2 jobs in parallel but the efficiency of job performance became definitely worse compared to only 1 job per machine. The reason might be found out by continous local job monitoring. According to our preliminary findings the peak of memory consumption, which happens only for a short while during reconstruction, may exceed the machine capabiliies when two jobs are running at the same time having the peak roughly at the same time, too.
In any case if the integrated CPU time at GSI compares with the percentual share of jobs running at GSI then the failure rate at GSI also compares with other sites.
Memory consumption of ALICE jobs at 64bit machines used to be significantly higher than at non 64bit machines with former AliRoot versions. This improved slightly with newer AliRoot versions.

If you compare the memory of GSI batch machines with other machines then please consider that our machines rarely have SWAP but mainly the on board memory. The Linux group believes that it does not make sense to use SWAP since this would slow down the jobs too much and would abuse the local disk.

The GSI bandwidth does not provide a reason for job failure. The GSI bandwidth is just limiting the amount of jobs we can run in parallel. The Suedhessennetz is not yet in production for our Grid jobs due to the mentioned political problems, which are not yet solved.

xrootd cluster: I agree with the problem analysis of Horst Goeringer and I also agree that we should follow up this more closely.

Cheers and have fun,

Kilian

Report message to a moderator

[Message index]

		Meeting on Thursday, April 26, 2007 By: Silvia Masciocchi on Wed, 25 April 2007 16:31
		Re: Meeting on Thursday, April 26, 2007 By: Kilian Schwarz on Thu, 26 April 2007 06:39
		Re: Meeting on Thursday, April 26, 2007 By: Kilian Schwarz on Thu, 26 April 2007 07:13

Next Topic:

Information collected after the meeting on 26.04.07

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Sat Jul 05 23:53:42 CEST 2025

Total time taken to generate the page: 0.00669 seconds