ND Group comments on Top-10-001 Paper draft
We congratulate the authors on completing this very important analysis. We think this is an important result which should be published, and we give our suggestions for strengthening the paper below:
We think that the focus of this paper should be on establishing the presence of the top cross section in the CMS data, rather than on measuring the top cross section. Therefore, we think the paper should focus more on the significance of the observed signal over the background and less on the cross section result.
In addition, we think there are parts of the paper, in particular the sections on background estimates and systematic uncertainties, that read like a summary of a much longer document. We found these parts of the paper to be very hard to read, and we suspect that they would be unintelligible to a reader who doesn't have access to the internal documentation. We suspect that PRL length limitations prevent the authors from providing a more understandable explanation of the details. In that case, we propose that these sections (primarily the sections on background estimates and systematic uncertainties) focus instead on the main concepts and final results involved in the background estimate and systematic uncertainty evaluation. The details of how these estimates were performed should be left to a longer follow-up paper in a journal that doesn't restrict paper length (e.g. PRD). We strongly endorse the preparation of such a paper as quickly as possible after this one.
Abstract: Consistent with our comments on the focus of the paper, we suggest including the event yield, the expected number of background events, and the significance of the signal in the abstract.
Lines 1-11: We think the conclusion could be strengthened by focusing less on the general interest top quarks generate and more on the specific case of this measurement, which is laying down the foundation upon which the entire CMS top quark analysis endeavor will be constructed.
Lines 37-43: This section is confusing. First, it gives the impression that we take the trigger efficiency from MC, with no input from data. We think it would be more natural to discuss the trigger, the trigger efficiency, and the trigger efficiency uncertainties in one location in the paper, rather than spreading it out over several sections.
Lines 44-48: Too detailed. Replace with a single sentence that only data from stable beam/detector containing a reconstructed primary vertex was used for this analysis.
Line 49: Could you simplify the description of muons by just stating that we reconstruct muons with an algorithm that requires a consistent set of tracker and muon chamber hits? The details of how we obtain the consistent set (tracker muon + global muon) are not relevant for a paper of this length.
Lines 87: Change to "The neutrinos don't interact with the detector and escape"
Line 106: Remove the "~" on the jet energy scale uncertainty. Either we use 5% or we don't. We can't use ~5%.
Lines 109-127: There is too much detail about how the expected signal yield was calculated. We think it would be better to begin simply by stating the expected signal yield and noting that it is calculated using a particular ttbar cross section calculation (give reference) in conjunction with an acceptance estimated with ME+PS MC.
Lines 126-143: We do not think sufficient explanation has been given as to why certain backgrounds need to be estimated from the data while others can be safely taken from the MC. We think this sort of detail (i.e. the bigger picture) is more important than the details of how individual backgrounds are estimated. For example, it seems inconsistent to estimate Z->tau,tau from MC while insisting that Z->mu,mu and Z->e,e be taken from the data. It should be explained why that isn't inconsistent.
Lines 134-137: As written, this does not make sense. There is absolutely no justification for why one would choose to use two different MC models for the different invariant mass ranges. We think this level of detail is too much anyway. Details at this level should be removed from the paper.
Table 1 and many places in text after line 141: We think that the authors need to introduce a standard, concise label for the backgrounds from QCD multi-jet and W+jets processes. In the table and various parts of the paper it is alternately referred to as "non-W/Z" or "non-genuine." Most importantly, a consistent label should be introduced explicitly and used throughout. We do not like the term "non-genuine." In many cases these are genuine leptons, just not from a W or Z decay. It seems that "non-W/Z" or even "QCD leptons" would be preferable.
Lines 144-185: We think this part of the paper needs to be revised to contain fewer specific details about how the backgrounds were estimated, and a more clear, concise overview of the general approach for each background, plus a summary of the result and important systematic uncertainties. A detailed description of the methods used to calculate the backgrounds should be deferred to a longer paper. We also think the paper would be more understandable if the background estimates and the estimates of the uncertainty on the background were presented together, instead of first talking about all background estimates and then treating all systematic uncertainties.
Figure 1: We think putting the background uncertainty bars on the stacked signal + background histogram hides the significance of the signal above the background. We would prefer to see the error bars placed on the histogram of just the background components.
Lines 191-217 and Figure 2: We support the idea of using the reconstructed mass distribution to bolster the evidence for our top quark signature. However, we think if this is done, we need to give more quantitative information about the agreement. We believe the following would strengthen the comparison: In the figure, separate the signal and the background shapes for the reconstructed mass. Also, evaluate quantitatively the agreement of the data with the expectation from ttbar. Finally, it would be very helpful to give some idea about the top quark mass range with which the data is consistent.
Lines 218-240: We think it would be better to discuss the systematic uncertainties on the signal acceptance in the same part of the paper where the signal acceptance is discussed. Furthermore, we think the level of detail here is excessive. For example, rather than breaking down every contribution to the signal acceptance uncertainty coming from varying details of the MC modeling, it should be acceptable to combine these modeling uncertainties into a single number and explain that this uncertainty includes a number of signal modeling effects.
Lines 241-247: The systematic uncertainty on the background estimates should be discussed in the same place the backgrounds estimates are discussed. Note: As written, the 50% uncertainties assigned to backgrounds estimated from MC seems arbitrarily assigned. Was this the intended impression?
Lines 257-258: If you're going to mention another result, the actual result should be quoted. Otherwise, just saying that there was another, unspecified result adds nothing to the paper.
The conclusion seems weak. It would be nice to have some discussion regarding the near and long term future of top physics at CMS--for example, a statement of the size of signal samples in various channels with a larger dataset. Reducing the level of specific details in parts of the paper above, as suggested, should leave more room for a more powerful conclusion.