Toby Burnett
Last update:
28 Apr 2005 06:44 -0700
Apply the resolution analysis to all four channels. Note that muon-tqb is truncated since the training failed. Conclude that we need 50 cycles for the tqb channels, and 90 for tb. In all cases, the boosted resolution is about 2.6, corresponding to a background/signal ratio of 5.8.

Examine the quantitative effect of boosting on the resolution for a measurement of a signal in the presence of background. The resolution function is,
where s and b are the signal and background contributions (measured by the weights) to each node: the sums are over the nodes. This represents the expected standard deviation for Nsignal=1. It should be multiplied by sqrt(Nsignal) for the signal standard deviation, and is directly related to the expected limit. Note that perfect separation, in which bins are either signal or background, corresponds to R=1. The other extreme is no separation, where all bins have the same b/s ratio, is R=sqrt(1+b/s).

As usual, the trees were trained with even events only, combining the 1- and 2-tag samples. Then testing is performed using odd, even, and all events. The value for no separation is sqrt(1+b/s) = 7.25, for s=1.33, b=68.62. It looks like the training limit is reached at around 85 cycles.
Here is a comparison with the NN equivalent from the D0 Note.
Check the effect of training vs. testing samples on some of the 2-d trees generated for the D0 note:
![]() |
![]() |
Clearly there is an effect, but it is not apparently large, especially since in this case
Add link to doxygen documentation of the current version classifier, which implements boosting
During the course of this analysis, we have noticed that minor changes in the training data sets seem to produce changes in measured limits from DT or NN methods of several pb. To understand the sensitivity of all the limits to small changes in the filter characteristics, we have taken advantage of how easy it is to generate DT filters. Since we were already training on only half of the training data, we made a study of the variation with respect to filters which were trained on randomly selected halves of the training data.
The first step was to update the trainer to be able to select entries randomly, with 0.5 probability. This is to be compared with the normal procedure, which is to select the even entries. Then we chose a particular channel for the variability study, tqb muon, combining 1 and >1 tags, and created 99 independent pairs of decision trees to separate the tqb data from the background sources wbb and lepjets.
Each tree is characterized by the efficiency plot: for a given cut on signal efficiency, what is the background contamination? The following plots show all 198 trees.

The full distribution is on the left, with a histogram of the background variation for a cut at 0.7 on the right. The discrete nature of our DT filters is apparent. As is known, the wbb separation is harder, and apparently the fluctuations slightly larger, than for lepjets.
Then the 99 pairs of trees were used to make 99 2-d likelihood histograms in the usual way, and each histogram analyzed to obtain expected (from the MC data only) and actual (from the actual data) limits. The distributions for expected, expected with systematics, and actual with systematics are shown next.

The numbers from the DT used in this note are respectively 8.4, 10.9, and 7.9 pb, each quite consistent with the range. The variation of the actual corresponds to our initial observations.
Here are results of the "1D" tree analysis. Use all 25 variables for each of the 4 cases.
| Table of Variables used with ratings | ||||
| name | muon-tb | electron-tb | muon-tqb | electron-tqb |
| BTaggedTopMass | 0.58 | 0.39 | 6.18 | 4.65 |
| BestTopMass | 1.03 | 1.68 | 0.62 | 1.10 |
| Cos_BTaggedJetAllJets_AllJets | 0.47 | 0.60 | 0.93 | 1.34 |
| Cos_LeptonQZ_BestTop | 1.26 | 0.46 | 1.08 | 0.60 |
| Cos_NotBestJetAllJets_AllJets | 0.27 | 0.30 | 0.55 | 0.28 |
| Cos_UntaggedJetLepton_BTaggedTop | 0.31 | 1.00 | 2.72 | 1.79 |
| DeltaRJet1Jet2 | 0.61 | 0.76 | 1.04 | 0.87 |
| HT_AllJets | 7.27 | 0.88 | 0.83 | 3.63 |
| HT_AllJets_MinusBTaggedJet | 0.54 | 0.42 | 0.91 | 0.39 |
| HT_AllJets_MinusBestJet | 0.42 | 0.37 | 1.54 | 3.16 |
| H_AllJets_MinusBTaggedJet | 0.56 | 0.46 | 1.03 | 1.10 |
| H_AllJets_MinusBestJet | 0.05 | 0.36 | 0.60 | 0.51 |
| InvariantMass_AllJets | 0.66 | 0.38 | 5.26 | 7.08 |
| InvariantMass_AllJets_MinusBTaggedJet | 1.35 | 7.18 | 0.58 | 1.24 |
| InvariantMass_AllJets_MinusBestJet | 1.12 | 1.57 | 0.60 | 1.39 |
| Jet1Pt_NotBest | 0.27 | 0.03 | 0.81 | 0.62 |
| Jet2Pt_NotBest | 0.56 | 0.61 | 1.13 | 0.41 |
| LeadingBTaggedJetPt | 2.19 | 8.16 | 2.31 | 1.34 |
| LeadingUntaggedJetPt | 1.65 | 2.15 | 0.36 | 0.50 |
| Pt_AllJets_MinusBTaggedJet | 1.29 | 2.22 | 0.71 | 1.10 |
| Pt_Jet1Jet2 | 1.49 | 1.35 | 1.30 | 0.84 |
| QTimesEta | 0.38 | 0.39 | 5.88 | 5.13 |
| SecondUntaggedJetPt | 8.24 | 1.65 | 0.62 | 0.68 |
| Shat | 1.36 | 0.43 | 0.84 | 0.54 |
| TransverseMass_Jet1Jet2 | 1.88 | 3.12 | 0.81 | 0.46 |
| Total | 35.80 | 36.91 | 39.25 | 40.75 |

Made this diagram to explain how decision trees work:

Caption:
A graphical representation of a portion of one of the eight trees generated for this analysis, the tb electron wbb tree. Six branch and three end nodes are shown.
Descriptive text.
Update the classification management code to make a table of the variables used for each training.
| Table of Variables used with ratings | ||||||||
| name | mu-tb-wbb | eCC-tb-wbb | mu-tb-lepjets | eCC-tb-lepjets | mu-tqb-wbb | eCC-tqb-wbb | mu-tqb-lepjets | eCC-tqb-lepjets |
| LeadingBTaggedJetPt | 0.0 | 0.2 | 0.8 | 0.5 | 0.4 | 0.9 | - | - |
| LeadingUntaggedJetPt | --- | - | - | - | 0.2 | 0.3 | 1.5 | 1.5 |
| SecondUntaggedJetPt | - | - | - | - | - | - | 0.2 | 0.4 |
| Jet1Pt_NotBest | 0.4 | 0.5 | 0.4 | 0.4 | - | - | - | - |
| Jet2Pt_NotBest | 0.5 | 0.2 | 3.1 | 1.8 | - | - | - | - |
| Pt_Jet1Jet2 | 2.0 | 1.9 | - | - | 2.5 | 1.1 | - | - |
| Pt_AllJets_MinusBTaggedJet | - | - | 0.1 | 0.3 | - | - | 0.1 | 0.4 |
| HT_AllJets_MinusBestJet | - | - | 2.8 | 3.9 | - | - | - | - |
| H_AllJets_MinusBestJet | - | - | 0.5 | 0.1 | - | - | - | - |
| H_AllJets_MinusBTaggedJet | - | - | 0.1 | 0.4 | - | - | 1.2 | 0.6 |
| HT_AllJets_MinusBTaggedJet | - | - | - | - | - | - | 33.3 | 31.4 |
| HT_AllJets | 11.8 | 16.0 | - | - | 0.7 | 2.6 | - | - |
| TransverseMass_Jet1Jet2 | 2.4 | 3.5 | - | - | - | - | - | - |
| InvariantMass_AllJets | 1.1 | 1.3 | 0.8 | 0.9 | 20.2 | 22.9 | 0.7 | 0.4 |
| InvariantMass_AllJets_MinusBestJet | - | - | 55.6 | 52.5 | - | - | - | - |
| InvariantMass_AllJets_MinusBTaggedJet | - | - | - | - | - | - | 8.0 | 9.5 |
| BestTopMass | 3.3 | 1.6 | - | - | - | - | - | - |
| BTaggedTopMass | 0.2 | 0.6 | 0.1 | 0.2 | 5.7 | 4.4 | 4.8 | 4.7 |
| Shat | 1.5 | 0.2 | - | - | 0.2 | 0.5 | 0.3 | 0.5 |
| DeltaRJet1Jet2 | 0.9 | 1.4 | - | - | 0.5 | 0.6 | - | - |
| QTimesEta | - | - | - | - | 1.9 | 0.8 | 5.6 | 5.2 |
| Cos_LeptonQZ_BestTop | 0.6 | 0.5 | - | - | - | - | - | - |
| Cos_UntaggedJetLepton_BTaggedTop | - | - | - | - | 2.3 | 1.8 | - | - |
| Cos_BTaggedJetAllJets_AllJets | - | - | - | - | 0.1 | 1.5 | 0.6 | 1.5 |
| Cos_NotBestJetAllJets_AllJets | - | - | 0.1 | 0.5 | - | - | - | - |
| totals | 24.7 | 27.7 | 63.6 | 60.9 | 34.3 | 36.5 | 56.3 | 56.1 |
Data: load all files without systematics, from Gordon's list:
The tree training output, with tree definitions, for all channels, and combined tags, is all here.
I used the same set of variables: a summary of the gini improvement from each, for each of the 4 channels is:
| Name | muons | electron | ||
| tb | tqb | tb | tqb | |
| InvariantMass_AllJets | 0.013 | 0.060 | 0.020 | 0.071 |
| BTaggedTopMass | 0.014 | 0.073 | 0.013 | 0.075 |
| Cos_UntaggedJetLepton_BTaggedTop | 0.016 | 0.035 | 0.012 | 0.028 |
| Pt_Jet1Jet2 | 0.022 | 0.022 | 0.017 | 0.015 |
| QTimesEta | 0.008 | 0.065 | 0.039 | 0.062 |
| Shat | 0.054 | 0.012 | 0.035 | 0.010 |
| LeadingBTaggedJetPt | 0.084 | 0.027 | 0.094 | 0.020 |
| LeadingUntaggedJetPt | 0.074 | 0.015 | 0.039 | 0.019 |
| HT_AllJets | 0.033 | 0.022 | 0.037 | 0.056 |
| DeltaRJet1Jet2 | 0.018 | 0.024 | 0.034 | 0.026 |
| Cos_BTaggedJetAllJets_AllJets | 0.010 | 0.011 | 0.006 | 0.013 |
The efficiency table graph:

A separate application analyzes this, and uses top_statistics to estimate the limits:

The missing EqOneTag files were generated, start again with electron[4]. Not very different.

New splits for electron: call it electron[3]
| Training pair |
files |
events | weights | |||||||||||||||
| tb-lepjets |
|
|||||||||||||||||
| tb-wbb |
|
|||||||||||||||||
| tqb-lepjets |
|
|||||||||||||||||
| tqb-wbb |
|
|||||||||||||||||

Aran submits the "final" version of the muon files:
"Daekwang has produced the final skim files for muons:p
/work/husky-clued0/aran/Daekwang_NN/"
Results are here.

Also in the above folder are postscript files and a root file containing all the histograms necessary for the 2-d likelihood analysis designed for NN. Since it may not be possible to load the root file over the Web, I've copied it to clued0: ~burnett/links/work/muon_classification.root. Note that rather than creating a different root file for each background component, they are in directories.
| root [0] TFile
f("muon_classification.root") root [1] f.ls() TFile** muon_classification.root TFile* muon_classification.root KEY: TDirectory s_channel;1 s_channel KEY: TDirectory t_channel;1 t_channel root [2] f.cd("s_channel") (Bool_t)1 root [3] f.ls() TFile** muon_classification.root TFile* muon_classification.root TDirectory* s_channel s_channel KEY: TDirectory data;1 data KEY: TDirectory dilep;1 dilep KEY: TDirectory lepjets;1 lepjets KEY: TDirectory tb;1 tb KEY: TDirectory tqb;1 tqb KEY: TDirectory wbb;1 wbb KEY: TDirectory wjj;1 wjj KEY: TDirectory wwlnujj;1 wwlnujj KEY: TDirectory wzlnujj;1 wzlnujj KEY: TDirectory QCD;1 QCD KEY: TDirectory s_channel;1 s_channel KEY: TDirectory t_channel;1 t_channel |
Thanks to Aran, (see 12::00 entry below) get the latest electron data set (which I call electron[2]). On his advice, I combine 1TAG and 2TAg files. Results are here.

Grab the latest files from Aran, in /work/husky-clued0/aran/Daekwang_NN. I'm calling this set muon[3]. Results are here. Looks essentially identical.

Rewrite the training/testing code to automate it, and to create a single efficiency table, which is easy to plot all at once.
The parameters are now determined by files in a dedicated folder:
- title.txt
- Descriptive title
- files.txt
- first line is a comma-delimited list of signal files, next for background
- variables.txt
- List, one per line, of the variables to use. First one is the weight (# in col 1 means line is ignored)
The class TrainingInfo manages this, providing access to classification parameters for the class Trainer.
After training, files summarizing the results are written to the same folder:
- log.txt
- output from classes doing the training
- dtree.txt
- Definition of the Decision Tree, see DecisionTree in the classifier package
- test.txt
- Result of testing with the odd events from the training sample
An example of the results of this procedure, for the old data set, is in this folder. Documentation of the classes is here.
A performance file is generated, see here. It can be imported and plotted in Excel with a few clicks:
Get the files collected from Philip, with help from Aran, and run the same code, with only changes to the path to data and classification specification/output. Use Aran's variables for now (are the the same?) There is basically no wbb data, (?) so show only the lepjets
Note: Aran says to use the files at /rooms/cafe/SingleTop_SKIMS/Data/Electron_Jets/Philips_Stradivarius/MC_SKIMS/NN_SKIMS/
Take a look at tqb-lepjets, the only other separation that seems to be needed. As with the wbb study, start with the same variables as were used for the NN training, then train with even, and test with odd events. Combine the plot, as Aran does.
|
![]() |
Now examine the tb (s-channel). Again use the same variables as Aran's NN analysis, and display the separation for the odd events, after training with the even ones.
|
![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
Implement the possibility to choose all, even, or odd events from the training sample, apply to the tqb-wbb muon study, with the best 6 variables.
|
![]() |
|||||||||||||||||||||||||||||||||||||||||
The plot is as expected: the separation is better with the training sample (even events) than the other half of the data.
Implement persistence of decision trees (the models created by classification) See documentation of the package here.
Using new set of files from Thomas:
| s-channel | t-channel | |||
| file | records | weights | records | weights |
| top | 29448 | 2.62 | 35273 | 3.97 |
| dilep | 27814 | 8.62 | 22768 | 7.81 |
| lepjets | 45938 | 27.13 | 41937 | 26.94 |
| qcd | 222 | 19.10 | 222 | 19.10 |
| wbb | 24192 | 14.06 | 18130 | 12.61 |
| wjj | 52303 | 62.12 | 40165 | 61.56 |
| ww | 6053 | 0.70 | 4401 | 0.70 |
| wz | 5431 | 0.18 | 3993 | 0.18 |
The "t-channel" files are apparently experimental, stick with the s-channel guys.
Applying the tree to tqb-wbb separation, with the variables used by Aran's NN study,
| Variable summary | |
| Name | improvement |
| InvariantMass_AllJets | 0.19588 |
| BTaggedTopMass | 0.05332 |
| Cos_UntaggedJetLepton_BTaggedTop | 0.03604 |
| Pt_Jet1Jet2 | 0.02268 |
| QTimesEta | 0.02128 |
| Shat | 0.01929 |
| LeadingBTaggedJetPt | 0.00605 |
| LeadingUntaggedJetPt | 0.00599 |
| HT_AllJets | 0.00567 |
| DeltaRJet1Jet2 | 0.00521 |
| Cos_BTaggedJetAllJets_AllJets | 0.00339 |
I get the following plots:
![]() |
![]() |
The variables are sorted according to the Gini reduction. Using only the fist six,

there is little difference.
Continue with Thomas' variable set.
I now have a an analysis of the tree that ranks the nodes in order of purity, using Classifier::purityMap. The class BackgroundVsEfficiency will print a table in order of the node purity, with columns for the cumulative efficiency and backgrouind content. It is then easy to make plots of the number of background vs efficiency.
Here is the variables used, and their Gini improvement ranking:
| Variable summary | |
| Name | improvement |
| _HT_AllJetsLeptonMET | 10.0109 |
| _WTransverseMassPrime | 6.49468 |
| _Jet3Pt | 4.40732 |
The following shows the result, for total signal vs. background (with total weights = 130), various subsets of the above variables:

The "HT1cut" represents a single branch: the full "HT only" shows how subsequent branches in the same variable create new alternatives for signal vs. background. One sees also how the addition of variables for branching improves the performance.
A simple criterion to chose an efficiency is to maximize S/sqrt(B), appropriate for small signal. The plot of this for the single HT cut and the full 3-variable tree follows:

Sent this mail off to uw-top group:
Hi folks,
I’m building C++ classes to deal with the classification trees, playing with the latest data set generated by Thomas, and concentrating on the variable HT_AllJetsLeptonMET, since Insightful Miner seemed to prefer it for separating the unweighted data, with an initial cut at 270 GeV. To my surprise when I started applying the weights, the cut shifted significantly, and the function that is optimized to determine the cut, the Gini improvement, developed two peaks, with essentially zero in between. I show the three plots here:
where I’ve rescaled the background for comparison. The shapes of the signal and background are very different, and I can understand why it now wants to cut at 165 GeV, since there is very little signal there.
Since I’m still developing the tools (and I have a lot to do), I’m not yet concerned with the physics, but the dramatic difference in the threshold behavior of the signal and background is determined solely by the weighting procedure: see the following plot for the corresponding unweighted Gini and event counts:
--
So I’m a little suspicious that the weighting should so distinguish between signal and background, if my plots are correct. Thoughts?
A summary of the data is:
|
file records weight sum ---------signal------------ schannel 7065 2.40322 tchannel 7164 3.57781 ---------background---------- dilep 5182 7.89733 lepjets 4124 24.8404 qcd 223 19.2116 wjets 3265 73.7715 |
New data set from Thomas: encapsulate with this script. Note now there is a qcd file, left off before?
# set up sym links
tpath=/rooms/cafe/SingleTop_SKIMS
tag1="Muon_Jets/p14Stradivarius/Tagged/p14Stradivarius_"
tag2="_TightIsolation_HighMissingEt_Tag/p14Stradivarius_"
tag3="_TightIsolation_HighMissingEt_Tag_RGS_SKIM.root"
rm -f *.root
ln -s $tpath/Data/${tag1}DATA${tag2}DATA$tag3 data.root
ln -s $tpath/Data/${tag1}WJETS${tag2}WJETS$tag3 wjets.root
ln -s $tpath/Data/${tag1}QCD${tag2}QCD$tag3 qcd.root
ln -s $tpath/MonteCarlo/${tag1}LEPJETS${tag2}LEPJETS$tag3 lepjets.root
ln -s $tpath/MonteCarlo/${tag1}DILEP${tag2}DILEP$tag3 dilep.root
ln -s $tpath/MonteCarlo/${tag1}SCHANNEL${tag2}SCHANNEL$tag3 schannel.root
ln -s $tpath/MonteCarlo/${tag1}TCHANNEL${tag2}TCHANNEL$tag3 tchannel.root
|
The root files are different: all the variables are in the TopTree.

Modify the extraction program as follows, since RGS_Variables is not a branch now. Generate the .txt tab-delimited files and read in to IM. Check the files
File #events <weight> Wt sum bkg. wjets 3264 0.0226 73.8 lepjets 4123 0.006 24.7 dilep 5181 0.0015 7.8 qcd 222 0.086 19.1 signal schannel 7064 0.00034 2.4 tchannel 7163 0.0005 3.6 data 78 1 135
Files used for tables:
# set up sym links tpath=/rooms/cafe/SingleTop_SKIMS tag=Muon_Jets/Preselection_SLV_TAG ln -s $tpath/Data/$tag/DATA_Preselection_SLV_TAG/MUQCD_DQ_PRESELECTION_TIGHTMUON_SLV_TAG_SKIM.root data.root ln -s $tpath/Data/$tag/WJETS_Preselection_SLV_TAG/WJETS_DQ_PRESELECTION_TIGHTHIGH_SLV_0TAG_TRF_SKIM.root wjets.root ln -s $tpath/MonteCarlo/$tag/SCHANNEL_Preselection_SLV_TAG/MUNBB_MC_PRESELECTION_TIGHTHIGH_SLV_TRF_SKIM.root schannel.root ln -s $tpath/MonteCarlo/$tag/TCHANNEL_Preselection_SLV_TAG/MUNBB_MC_PRESELECTION_TIGHTHIGH_SLV_TRF_SKIM.root tchannel.root ln -s $tpath/MonteCarlo/$tag/DILEP_Preselection_SLV_TAG/TTBAR_DILEP_MC_PRESELECTION_TIGHTHIGH_SLV_TRF_SKIM.root dilep.root ln -s $tpath/MonteCarlo/$tag/LEPJETS_Preselection_SLV_TAG/TTBAR_LEPJETS_MC_PRESELECTION_TIGHTHIGH_SLV_TRF_SKIM.root lepjets.root |
First step: read root files, create tab-delimited text files from the RGS_Variables branch, simplify variable names, using this code
Next: read the 6 files into Insightful Miner 2, combine and tag signal and background, then run a classification tree; export the tree to predict the composition of the data.

This does not make much sense, since the background events have rather different weights:
File #events <weight> Wt sum bkg. wjets 2282 0.02242 51.2 lepjets 3606 0.00408 14.7 dilep 4275 0.00103 4.4 signal schan 6162 0.00023 1.4 tchan 6700 0.00033 2.2 data 78 1 78.0
Given the attempt anyway, the variables most useful for classification are:

And the cross-tab to show the separation is:
|
|||||||||||||||||||||||
Note that it does not classify the background well!
for similar NN analysis.