# High Energy Physics

Useful Statistics for the HEP Laboratory

There follows a quick summary of some useful statistics to bear in mind for your experiments. This is nowhere near an exhaustive description, so please read the references given at the end to get a full picture of the ideas presented here.

Please remember that it does not how scattered your data points are until sensible scales are used for the graph and the error bars are also plotted.

Useful Quantities

• The meanMean = (1n) ∑ i xi
where n is the number of samples.
• The modeThe mode of a sample is the value that occurs most commonly.
• The varianceV = (1n)∑ i (xi – µ)2where
µ is the population mean
n is the population size.
• The standard deviationThe standard deviation is a more measure of the spread of the data because it has the same units as the data itselfσ = √ V

Statistical Distributions

• The binomial distributionThe binomial distribution describes processes involving identical trials that have two possible outcomes; such as tossing a coin (heads or tails) or assessing the treatment of patients (kill or cure).
Example: Detector efficiencies.
You are trying to measure the tracks of cosmic ray particles using scintillation detectors, which are, say, 95% efficient. You make a sensible decision that at least three points are needed to define a track. How efficient at detecting tracks would a stack of three detectors be? Would using four or five detectors give a significant improvement?
• The probability of three hits from three detectors is
P(3;0.95,3) = (0.95)3 = 0.857
so this would be 85.7% efficient.
• For four detectors the probability of three or four hits is
P(3;0.95,4) + P(4;0.95,4) = 0.171 + 0.815 = 98.6%
• For five detectors,
P(3;0.95,5) + P(4;0.95,5) + P(5;0.95,5) = 0.021 + 0.204 + 0.774 = 99.9%
• The Poisson distribution.The binomial distribution describes situations where definite outcomes occur in a specific number of occasions, but the number of trials is not always a certain quantity. The Poisson distribution deals with sharp events occuring in a continuum. For example, in a thunderstorm we know there will be a definite number of flashes of lightening, but it is meaningless to ask how often there was not a flash.
• The Gaussian distribution.The Gaussian distribution is the most well known of the distributions, describes many different sorts of data and is particularly useful in the field of measurement errors, due to the central limit theorem.
• The chi-square distribution. The quantity chi-square is the squared difference between the observed values and their theoretical predictions. If the function agrees well with data (or vice versa) then chi-square will be small. If you have a large value of chi-square at the end of your experiment then something is amiss. A very small value of chi-square should also be unlikely, since the errors should make the measurements deviate from their ideal values to some extent.

References

• Statistics- A Guide to the Use of Statistical Methods in the Physical Sciences by R.J. Barlow.
[519.2 BAR]
• Fundamental formulas of physics chapter 2 – statistics edited by Donald H. Menzel.
[53(083) ME]
• Measurements and their uncertainties -a practical guide to modern error error analysis by I.G. Hughes and T.P.A. Hase.
[511.43 HUG]