|
|
A small application written to explore using a GUI to investigate particle-decay relationships in Monte Carlo decay trees, which uses a SQL database as a backend. Included below is a ROOT comparison. You can download the stuff I used as well, if you want. I suspect the only way to really change the performance numbers is by redesigning the data layout; I have a short description of that below too. I can see no reason why one could not marry the GUI I did here to a root backend. Outline: HistoryThis small project was a meant to be a comparison between a SQL database and ROOT for doing fairly simple things. It was initially motivated by my desire to have a better interface to asking basic Monte Carlo questions, like "how often does a b quark decay to a muon?" For me, those are the sort of questions that if I can't answer them in a few minutes it becomes too much effort. Since I can never remember the ROOT code to do something like that (and DZERO keeps changing its data format!), I decided to write a GUI front-end. And because Visual Studio 2005 includes the express version of SQL server free (for anyone to instal on a Windows machine), I decided to use that as the backend to hold the data. Obviously, the next thing for me to do is a direct speed comparison... ROOT ComparisonI've made as many comparisons as I can; feel free to email me to request others (i.e. if I can do it easily). All timing was done using TStopwatch in ROOT, and the basic DateTime structure in .NET. Note that root takes a 1.5 second hit every time it starts up. This is spent loading the shared libraries to read the CAF tree. I did not try running without these shared libraries. The ComputerI used my portable for all tests. It is a 2 GHz Pentium M with 2 GB of memory (yeah!) and a 60 gig 7200 RPM disk (also yeah!). It is running Windows XP SP2, with all the latest patches, etc. Very little else was running on the computer during the tests (I even turned off desktop indexing). I suspect disk i/o is a huge factor in these tests. While running I noticed that SQL Server would sometimes have up to a 1/2 gig of memory allocated, so I suspect that people with less memory on their computers will notice a real difference in performance. The DataI used a sample of 12250 events of Z->ccbar events produced in CAF format by DZERO. The root data file is 480 MB and includes much more than just the Monte Carlo information -- jets, electrons, muons, tagging, etc. To import into the MC Inspector program I converted the MC information into XML (which takes up 1.71 GB!!!). The empty database is 2 MB, and once populated with 500 MB. ROOT clearly wins on space. It took 18 minutes to import the data. Counting ParticlesHow long does it take to answer the question "How many Z's are in the data sample" or "How many events have a Z?". Every single event turns out to have two Z's, btw (no idea why -- second one decays to nothing; artifact of the DZERO MC generation process). Using MC Inspector (screen shot), a cold boot required 26 seconds to answer the question. This seems to be fairly consistent as long as I don't ask the same question twice (i.e. if I ask for the # of Z0's twice in a row) -- in that case it takes about 1.5 seconds or so. And in many cases results are cached. The speed of the query seems independent of how often the particle occurs (which may mean one could speed it up by adding an SQL index on the particle ID). I used the TTree::Draw call to count particles. It took 20.0 seconds to determine the total number of Z's that occurred in the sample and count the number of events that contained Z's (script - you can do it at once - cool). Repeated running didn't change the time it took. At first I couldn't figure out how to count the number of events that contained a Z in the sample (as opposed to the number of Z's), but I got help on the about-root mailing list. And, yes, the two got the same answer! One interesting thing is that the root version (especially after the first run) is almost all CPU bound, whereas the SQL version is mostly disk bound, with a short spike in CPU time. Counting Direct DecaysThis amounts to asking "How many times do we find a Z->c decay in this sample", and "how many events have at least one "Z->c" decay in them"? Using MC Inspector (screen shot), it took 3.8 seconds to answer the "how many decays in this sample" and 1.5 seconds to count the number of events in which this happened. The two queries were run sequentially, and both of these were run after a query for "Z" and "c" (separately) was run (this is how the GUI is designed, sorry). I used the same technique to look for the direct decays in ROOT. The expression evaluated by ROOT, however, is quite a bit more complex (see script) and it takes 28 seconds to find an answer. This does not change if I scan through the list first (as it does for the SQL version of this question, and subsequent questions). When I was not checking for nulls initially, it took about 31 seconds to process. So the extra evaluation required to check for nulls sped things up by almost 10% (check for nulls means look to see if a parent pointer is null before de-referencing it). Wow. Counting Indirect DecaysThis is asking "How many times does a charm quark eventually decay to a mu+ through some process?" -- not really caring about what happens in between. Using MC Inspector (screen shot), it took 3 seconds to both count the number of times is occurs in the sample as well as the number of events. Note that this query is run after a separate charm and mu query, along with a query that looks for direct decays between the c and the mu (as opposed to indirect queries). Without a redesign of the layout of the data (or adding functionality to the MCpart class), I didn't see how to do this from a TTree:::Draw. Here I have to use a loop. This is unfortunate, as I'd like to have kept this a "easy-to-use-method" comparison (one of the goals of MC Inspector). Note this isn't really a problem with ROOT as much as it is functionality that could be added to the MCpart class by DZERO. The simple script I wrote took 28 seconds to count both the number of occurrences and the number of events (I ran it compiled, if I ran the script interpreted it took about 35 seconds -- it took about 5 seconds to compile the code). Two Prong DecayThe question used here was "How many times does a Z decay to a charm and anti-charm?" Using MC Inspector (screen shot), it took 3.4 seconds both to count the occurrences and also the number of events that they occurred in. Again, this was after individual scans had been run for Z, c, and cbar. Again, in the case of direct ROOT running, I don't see how to make this work without writing a loop, which I can't do in the TTree::Draw guy. So, I wrote another small script to perform the calculation. It took 23 seconds to answer both questions. Skip-A-Particle DecayTrying to answer the question "Z->X->c" where x is some random particle. With MC Inspector (screen short), after scans for Z and charm were run, it took 1.9 seconds for both counting the number of events and the number of times they occurred. Since this is a simple decay chain, I can again use TTreeDraw. The selector, however, is quite long and would be very painful to type -- 27 seconds (but almost 20 minutes to actually write the thing and check for nulls and errors). Data DesignThe ROOT design consists of a fairly standard pair of objects, TMBMCvtx and TMBMCpart. The data is stored, using ROOT I/O, in its hierarchical form. Thus a vtx has a list of pointers (TRefArray's) to daughter and parent part's, and the part knows about (via TRef) its birth and death vertex. The vertex also stores its position, and the particle stores its 4-vector. The SQL database is several tables. But two of them are not important -- the first is a match between PDG ID and particle name, and the second is a list of the samples. These tables, basically, serve to assign a unique integer number to human readable strings. The table that does most of the work is the Particles table, and it contains the following columns:
The modified preorder traversal algorithm off-loads the tree traversal required to determine parentage to data insertion, allowing one simple "<" and a ">" comparison of each particle to determine if it came from another particle, no matter how far up the chain. It is nifty. MC Inspector currently uses this only to do the indirect decay product test (i.e. the "does a charm eventually decay to a muon?" test). The data is clearly redundant to the PartentID, but I could see no easy way to write SQL code that was recursive. I had to do exactly this for ROOT, btw. MC InspectorI hacked this program together over about 3 days using Visual Studio 2005. I don't think I used any advanced features (other than SQL Express -- a stripped down version of SQL Server). All code was written in C#, which is a Java-like language. Almost all C# code is devoted to either GUI implementation or constructing the SQL statement. The SQL database does about 99.9% of the heaving lifting for this program. A small text file is written out in your My Documents area that has timing numbers in it. There is no way to turn it off at this time. ;-)Comments on the User InterfaceThese are mostly notes to myself about using the MC Inspector user interface.
Bugs
DownloadNote that this program requires both the .NET 2.0 framework (which you may have downloaded from windows update) and also SQL Server Express, which I'm guessing most people have not. If you execute the setup.exe program it will automatically install any missing bits by downloading them to your machine directly from Microsoft. 1/2/2006: Uploaded version 1.1 (source code is 1.0 still). Fixes some of the worst bugs -- just enough so I can play without hurting myself. I also added the ability to read gziped xml files over the internet. You can type in the following URLs to the File Browse thing if you like:
|