MC Inspector

 
Home Research Classes

A small application written to explore using a GUI to investigate particle-decay relationships in Monte Carlo decay trees, which uses a SQL database as a backend. Included below is a ROOT comparison. You can download the stuff I used as well, if you want. I suspect the only way to really change the performance numbers is by redesigning the data layout; I have a short description of that below too. I can see no reason why one could not marry the GUI I did here to a root backend.

Outline:

History

This small project was a meant to be a comparison between a SQL database and ROOT for doing fairly simple things. It was initially motivated by my desire to have a better interface to asking basic Monte Carlo questions, like "how often does a b quark decay to a muon?" For me, those are the sort of questions that if I can't answer them in a few minutes it becomes too much effort.

Since I can never remember the ROOT code to do something like that (and DZERO keeps changing its data format!), I decided to write a GUI front-end. And because Visual Studio 2005 includes the express version of SQL server free (for anyone to instal on a Windows machine), I decided to use that as the backend to hold the data. Obviously, the next thing for me to do is a direct speed comparison...

ROOT Comparison

I've made as many comparisons as I can; feel free to email me to request others (i.e. if I can do it easily). All timing was done using TStopwatch in ROOT, and the basic DateTime structure in .NET. Note that root takes a 1.5 second hit every time it starts up. This is spent loading the shared libraries to read the CAF tree. I did not try running without these shared libraries.

The Computer

I used my portable for all tests. It is a 2 GHz Pentium M with 2 GB of memory (yeah!) and a 60 gig 7200 RPM disk (also yeah!). It is running Windows XP SP2, with all the latest patches, etc. Very little else was running on the computer during the tests (I even turned off desktop indexing). I suspect disk i/o is a huge factor in these tests. While running I noticed that SQL Server would sometimes have up to a 1/2 gig of memory allocated, so I suspect that people with less memory on their computers will notice a real difference in performance.

The Data

I used a sample of 12250 events of Z->ccbar events produced in CAF format by DZERO. The root data file is 480 MB and includes much more than just the Monte Carlo information -- jets, electrons, muons, tagging, etc. To import into the MC Inspector program I converted the MC information into XML (which takes up 1.71 GB!!!). The empty database is 2 MB, and once populated with 500 MB. ROOT clearly wins on space. It took 18 minutes to import the data.

Counting Particles

How long does it take to answer the question "How many Z's are in the data sample" or "How many events have a Z?". Every single event turns out to have two Z's, btw (no idea why -- second one decays to nothing; artifact of the DZERO MC generation process).

Using MC Inspector (screen shot), a cold boot required 26 seconds to answer the question. This seems to be fairly consistent as long as I don't ask the same question twice (i.e. if I ask for the # of Z0's twice in a row) -- in that case it takes about 1.5 seconds or so. And in many cases results are cached. The speed of the query seems independent of how often the particle occurs (which may mean one could speed it up by adding an SQL index on the particle ID).

I used the TTree::Draw call to count particles. It took 20.0 seconds to determine the total number of Z's that occurred in the sample and count the number of events that contained Z's (script - you can do it at once - cool). Repeated running didn't change the time it took. At first I couldn't figure out how to count the number of events that contained a Z in the sample (as opposed to the number of Z's), but I got help on the about-root mailing list.

And, yes, the two got the same answer! One interesting thing is that the root version (especially after the first run) is almost all CPU bound, whereas the SQL version is mostly disk bound, with a short spike in CPU time.

Counting Direct Decays

This amounts to asking "How many times do we find a Z->c decay in this sample", and "how many events have at least one "Z->c" decay in them"?

Using MC Inspector (screen shot), it took 3.8 seconds to answer the "how many decays in this sample" and 1.5 seconds to count the number of events in which this happened. The two queries were run sequentially, and both of these were run after a query for "Z" and "c" (separately) was run (this is how the GUI is designed, sorry).

I used the same technique to look for the direct decays in ROOT. The expression evaluated by ROOT, however, is quite a bit more complex (see script) and it takes 28 seconds to find an answer. This does not change if I scan through the list first (as it does for the SQL version of this question, and subsequent questions). When I was not checking for nulls initially, it took about 31 seconds to process. So the extra evaluation required to check for nulls sped things up by almost 10% (check for nulls means look to see if a parent pointer is null before de-referencing it). Wow.

Counting Indirect Decays

This is asking "How many times does a charm quark eventually decay to a mu+ through some process?" -- not really caring about what happens in between.

Using MC Inspector (screen shot), it took 3 seconds to both count the number of times is occurs in the sample as well as the number of events. Note that this query is run after a separate charm and mu query, along with a query that looks for direct decays between the c and the mu (as opposed to indirect queries).

Without a redesign of the layout of the data (or adding functionality to the MCpart class), I didn't see how to do this from a TTree:::Draw. Here I have to use a loop. This is unfortunate, as I'd like to have kept this a "easy-to-use-method" comparison (one of the goals of MC Inspector). Note this isn't really a problem with ROOT as much as it is functionality that could be added to the MCpart class by DZERO. The simple script I wrote took 28 seconds to count both the number of occurrences and the number of events (I ran it compiled, if I ran the script interpreted it took about 35 seconds -- it took about 5 seconds to compile the code).

Two Prong Decay

The question used here was "How many times does a Z decay to a charm and anti-charm?"

Using MC Inspector (screen shot), it took 3.4 seconds both to count the occurrences and also the number of events that they occurred in. Again, this was after individual scans had been run for Z, c, and cbar.

Again, in the case of direct ROOT running, I don't see how to make this work without writing a loop, which I can't do in the TTree::Draw guy. So, I wrote another small script to perform the calculation. It took 23 seconds to answer both questions.

Skip-A-Particle Decay

Trying to answer the question "Z->X->c" where x is some random particle.

With MC Inspector (screen short), after scans for Z and charm were run, it took 1.9 seconds for both counting the number of events and the number of times they occurred.

Since this is a simple decay chain, I can again use TTreeDraw. The selector, however, is quite long and would be very painful to type -- 27 seconds (but almost 20 minutes to actually write the thing and check for nulls and errors).

Data Design

The ROOT design consists of a fairly standard pair of objects, TMBMCvtx and TMBMCpart. The data is stored, using ROOT I/O, in its hierarchical form. Thus a vtx has a list of pointers (TRefArray's) to daughter and parent part's, and the part knows about (via TRef) its birth and death vertex. The vertex also stores its position, and the particle stores its 4-vector.

The SQL database is several tables. But two of them are not important -- the first is a match between PDG ID and particle name, and the second is a list of the samples. These tables, basically, serve to assign a unique integer number to human readable strings. The table that does most of the work is the Particles table, and it contains the following columns:

Column Name Description
ID Unique number for each particle (primary key -- indexed)
PDGID The PDG ID number for this particle. For. Key to the particle names table.
SampleID The sample this particle comes from. For. Key to the samples DB
PX The momentum, X (not used by MC Inspector)
PY The momentum, Y (not used by MC Inspector)
PZ The momentum, Z (not used by MC Inspector)
ParentID A reference to the parent particle of this particle (the ID column above).
LeftID Used to implement the preorder traversal algorithm (see below)
RightID Used to implement the preorder traversal algorithm (see below)

The modified preorder traversal algorithm off-loads the tree traversal required to determine parentage to data insertion, allowing one simple "<" and a ">" comparison of each particle to determine if it came from another particle, no matter how far up the chain. It is nifty. MC Inspector currently uses this only to do the indirect decay product test (i.e. the "does a charm eventually decay to a muon?" test). The data is clearly redundant to the PartentID, but I could see no easy way to write SQL code that was recursive. I had to do exactly this for ROOT, btw.

 

MC Inspector

I hacked this program together over about 3 days using Visual Studio 2005. I don't think I used any advanced features (other than SQL Express -- a stripped down version of SQL Server). All code was written in C#, which is a Java-like language. Almost all C# code is devoted to either GUI implementation or constructing the SQL statement. The SQL database does about 99.9% of the heaving lifting for this program. A small text file is written out in your My Documents area that has timing numbers in it. There is no way to turn it off at this time. ;-)

Comments on the User Interface

These are mostly notes to myself about using the MC Inspector user interface.

  • Putting things together with the GUI is much simpler than doing them in the ROOT -- especially when when I wanted to string together multiple questions along the lines above. This was especially true when considering cascade decays, or "eventually decays to" type decays.
  • Sometimes the question you want to know is "Z->ccbar" and you could really care less about "Z" and "c" and "cbar". Doing those queries only slow you down. It would be nice to prioritize the queries and try to answer the most complex first.
  • There are lots of weird things that go on in a MC file. Sometimes it is nice to be able to look at the full decay chain, or part of it, in detail for a few events to see what happens.
  • If you ask for Z->c->X you'll get more hits than Z->c. This is because the Z->c->X will match twice if the charm decays to two other particles. Not clear this is the most intuitive thing. I think if you are asking it that way you are asking "Z->c->X such that the charm decays to something" rather than everything it decays to.
  • As soon as one adds the "X" into the mix for the particle you really really want to know what particles those are!
  • For the indirect decay you definitely want to know what decay chains (in detail) happen there.
  • It would be very nice to ask "how often does a b decay to a mu with a pt>1 GeV" or similar.
  • I can see Strassler saying "Hey, how do you make an angular correlation of the b-quark and the lepton!?" While that is tempting, I'm not sure this design can deal with that extension easily (i.e. combining objects).
  • It would be very nice to be able to do calculations right in the application -- percentages, etc.
  • Smaller sized data format for importing!!
  • If you start a decay chain calc, change the decay chain, when the first calc finishes the yellow "work" box is cleared, even though there is another calc running.

Bugs

  • If  your particle name is too long you can't see it ("mu+") for example.
  • You can't delete or move or change connections once established.
  • If you create a particle box, then select a sample, the yellow "at work" box next to the particle name doesn't come on to warn you that MC Inspector is re-scanning the database.
  • Once a DB query is started it can't be interrupted, even if the program knows that its results will be discarded.
  • Particle boxes can easily overlap each other as they grow (as you run more samples) and thus obscure each other. No way to work around this.

Download

Note that this program requires both the .NET 2.0 framework (which you may have downloaded from windows update) and also SQL Server Express, which I'm guessing most people have not. If you execute the setup.exe program it will automatically install any missing bits by downloading them to your machine directly from Microsoft.

1/2/2006: Uploaded version 1.1 (source code is 1.0 still). Fixes some of the worst bugs -- just enough so I can play without hurting myself. I also added the ability to read gziped xml files over the internet. You can type in the following URLs to the File Browse thing if you like:

  • http://www-clued0.fnal.gov/~gwatts/caf_zbb.xml.gz
  • http://www-clued0.fnal.gov/~gwatts/caf_zcc.xml.gz
  • http://www-clued0.fnal.gov/~gwatts/caf_zqq.xml.gz