Home » PANDA » PandaRoot » Bugs, Fixes, Releases » Error during PndTpcElectronicsTask
Error during PndTpcElectronicsTask [message #11351] |
Mon, 20 December 2010 09:51 |
Tobias Weber
Messages: 9 Registered: November 2010
|
occasional visitor |
From: *kph.uni-mainz.de
|
|
Hi all,
I encountered a problem during the digitization. When running my digi.C I get the following error:
Error: Symbol #include is not defined in current scope X3872_digi.C:147:
Error: Symbol exception is not defined in current scope X3872_digi.C:147:
Syntax Error: #include <exception> X3872_digi.C:147:
Error: Symbol G__exception is not defined in current scope X3872_digi.C:147:
Error: type G__exception not defined FILE:/home/webert/Documents/Diplomarbeit/X3872_tpc_noFT/./X3872_digi.C LINE:147
(int)0
*** Interpreter error recovered ***
By comenting I found out that it arises because of the PndTpcElectronicsTask.
I am using a fresh installation of PandaRoot(rev. 10456) and the ext. packages from january.
Best Regards,
Tobias
|
|
|
|
|
|
|
Re: Error during PndTpcElectronicsTask [message #11360 is a reply to message #11357] |
Mon, 20 December 2010 17:39 |
StefanoSpataro
Messages: 2736 Registered: June 2005 Location: Torino
|
first-grade participant |
From: *to.infn.it
|
|
The problem is connected to the memory usage of TPC digitization, probably.
Still, it has to be clarified if it comes from TPC data objects or from Links.
Once I have tried to remove the inheritance of the Tpc objects from FairMultiLinkedData, restoring them to TObject, and the digitization proceeded without crashes; but of course, by removing the links, the data objects are smaller in size then it is easier to not fill the memory.
It is possible that the memory is messed by the fact that TPC data objects are storing pointers to the previous objects, instead of only the index number, and maybe this could also raise problems. But a test to take out this dependence was never done, and I am not going to do it considering it requires to change the tpc code structure.
Meanwhile, I have found the same crash in the reco part. To overcome it, I have commented out all the SetLink calls in the lhe code and in TrackData, and I was able to run reco also for 10k events.
Running the pid, again the same crash. Together with Lia I have cleaned a bit the code, but the crash is still persistent. I have commented out the SetLinks in the PidCorrelator, still the same crashes. I have tried to take out the inheritance of VAbsMicroCandidate from FairMultiLinkedData, and now it is still running.
For sure, there are problems with the FairLinks. But I am not so sure they are the cause of the TPC crash.
|
|
|
|
Re: Error during PndTpcElectronicsTask [message #11362 is a reply to message #11361] |
Mon, 20 December 2010 18:18 |
StefanoSpataro
Messages: 2736 Registered: June 2005 Location: Torino
|
first-grade participant |
From: *to.infn.it
|
|
Just few comments from my side:
Felix Boehmer wrote on Mon, 20 December 2010 17:55 | Dear Stefano,
The problem seen by Tobias Weber could be really anything - to me this just looks like CINT stumbled into some kind of uncontrolled behavior (maybe due to bad alloc).
|
If the problem is really "anything", then I am wondering why it is since six months that we have it and nobody was able to fix it (or most probably almost nobidy has tried).
Quote: |
Generally, the cross-reference to other TPC objects via pointers was necessary at the time they were introduced, since no such mechanism existed prior to Tobias' FairLink approach. It maybe is not pretty, but is no bit worse than keeping index lists to TClonesArray entries - also in terms of memory consumption!
If the crashes are really connected to this but only appear when the FairLinks are used, then it looks like handling of objects members with pointer type is not done correctly inside the FairLinks (considering also that the cross-reference using pointers inside the TPC classes has been around for wuite some time).
|
The feeling is that links increase the amount of allocated memory in the object in a non-linear way, and maybe it fights against the TPC data structure (which is the only code using pointers; we had something in EMC but we have taken them out).
It is a matter of fact that now this kind of crash appears in TPC and not with other code, at least in digitization.
Quote: |
Can the people who experience these crashes please try to reproduce this problem with and without FairLinks, keeping an eye on memory consumption at the time of the crash (a simple "top" should suffice). Also it might help to compile the macros used for a more sensible crash stack.
|
I have already spent enough time on this, giving all the details in the forum on how to reproduce the crash. I could not check memory consumption because it takes hours before the crash appears. Links cannot be taken out so easily, because of FairHits inheritance. In my case, removing them but from PndTpcCluster, the macro worked. But again, running the code whcih create PndTpcCluster, I had again the same problem.
Quote: |
Right now it is really hard to hunt down the problem, as I have never seen these crashes for myself, nor am I familiar what happens inside the FairLinks in full detail.
|
Does it means that you have tried to run 10k DPM events and the consequent digitization without crashes at all?
|
|
|
|
Re: Error during PndTpcElectronicsTask [message #11366 is a reply to message #11365] |
Mon, 20 December 2010 21:47 |
StefanoSpataro
Messages: 2736 Registered: June 2005 Location: Torino
|
first-grade participant |
From: *0-87-r.retail.telecomitalia.it
|
|
Felix Boehmer wrote on Mon, 20 December 2010 19:17 | Dear Stefano,
let's try to be a little more constructive. The "Symbol___G exception" hints towards an uncaught system signal, most likely a bad alloc. Memory load seems to be the most likely reason for these crashes.
|
The error comes from FairRootmanager, when trying to read (ReadEvent) or filling (ForceFill)a Tree. It is the TTree command the one who gives the exception which is not caught. This means that the data saved in the tree, or which are going to be saved, are somehow corrupted. Discussing with Mohammad, a missing empty constructor/destructor or some not initialized data member could bethe cause, but a I was not able to find such a case in a glance. Maybe there is something else.
But let me repeat the question: have you tried and succeed running 10k dpm events?
As far as I know this problem appears in Torino, at GSI and also in Bonn.
|
|
|
Re: Error during PndTpcElectronicsTask [message #11369 is a reply to message #11366] |
Tue, 21 December 2010 11:53 |
Felix Boehmer
Messages: 149 Registered: May 2007 Location: Munich
|
first-grade participant |
From: *natpool.mwn.de
|
|
Dear Stefano,
we have simulated many thousand DPM events just before the last meeting at GSI during testing of the pattern recognition, although I cannot provide you with the exact number. I will build a new clean trunk and test it again.
Quote: | The error comes from FairRootmanager, when trying to read (ReadEvent) or filling (ForceFill)a Tree. It is the TTree command the one who gives the exception which is not caught. This means that the data saved in the tree, or which are going to be saved, are somehow corrupted. Discussing with Mohammad, a missing empty constructor/destructor or some not initialized data member could bethe cause, but a I was not able to find such a case in a glance. Maybe there is something else.
|
Please be a little more exact about this, and elaborate why you suspect this. For me the behavior you describe is really only compatible with the assumption that we run into memory overload because we a) have rare events where very large numbers of objects would be created, or b) we have a permanent memory leak somewhere, most likely caused by a faulty destructor.
I have never seen this error on my system - maybe because I use a 64 bit machine, I don't know.
I'll investigate it and keep you updated.
|
|
|
Re: Error during PndTpcElectronicsTask [message #11373 is a reply to message #11369] |
Tue, 21 December 2010 12:11 |
StefanoSpataro
Messages: 2736 Registered: June 2005 Location: Torino
|
first-grade participant |
From: *to.infn.it
|
|
Felix Boehmer wrote on Tue, 21 December 2010 11:53 | Dear Stefano,
we have simulated many thousand DPM events just before the last meeting at GSI during testing of the pattern recognition, although I cannot provide you with the exact number. I will build a new clean trunk and test it again.
|
This would be nice.
Quote: |
Please be a little more exact about this, and elaborate why you suspect this. For me the behavior you describe is really only compatible with the assumption that we run into memory overload because we a) have rare events where very large numbers of objects would be created, or b) we have a permanent memory leak somewhere, most likely caused by a faulty destructor.
|
I would opt for option b), considering tht if you run exactly the messy event you do not get the error. Then I would think it is due to the integral of all the previous events -> memory slowly increasing and producing a mess somewhere.
Quote: |
I have never seen this error on my system - maybe because I use a 64 bit machine, I don't know.
|
I have investigated both 32bit and 64bit (i.e. lenny64) architectures, finding the same crash in both of them. I don't know which machines Ralf or Tobias were using.
|
|
|
|
Re: Error during PndTpcElectronicsTask [message #11380 is a reply to message #11378] |
Tue, 21 December 2010 18:23 |
StefanoSpataro
Messages: 2736 Registered: June 2005 Location: Torino
|
first-grade participant |
From: *to.infn.it
|
|
Felix Boehmer wrote on Tue, 21 December 2010 16:42 | Hi again,
I started out with the simulations of many thousand BoxGen Events (each multiplicity = 5).
|
Could you please tell exactly which macros are you running? I want to test your sample. I have seen that for some reason it is easier to get the error with DPM than by using particle gun. Have you tried to run also dpm?
And, are you using jan10 external packages or the trunk version?
Quote: |
[*] Memory load *slowly* grows step-wise, compatible with the fact that the TClonesArrays in memory will always have the size given by the largest event... saturating at roughly 500 MB
|
If I have understood well, if we use XXXArray->Delete() (as we are doing in all the tasks now) the size of the TCA should restart from zero each event, and should not take the one from the largest event. I ask for confirmation (this is the reason why all the "Clear" were taken out).
Quote: |
This is strange behavior indeed. The fact that the memory load *drops* again after some time after the bad event proves that the memory consumption can not be caused by objects that live in the TClonesArrays, since that size would never decrease again.
|
I supposed this was connected to the Delete, but I am not so sure.
Quote: |
Also it can't be temporary events of one event, since they would have to disappear before the next event is processed, which is not what I see. The current guess is that the caching of the out-TTree is causing this...
|
Could be also, considering that the "faulty" part is connected to reading/writing the tree.
|
|
|
|
Goto Forum:
Current Time: Fri Nov 22 18:30:20 CET 2024
Total time taken to generate the page: 0.00724 seconds
|