Biology, images, analysis, design...
Use/Abuse Stat.Book Beginners Stats & R
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)



Beginners statistics: mean

Example, with R,  Definition and Use,  Simple formula,  Tips and Notes,  Test yourself,  References  Download R  R is Free, very powerful, and does the boring calculations & graphs for scientists.

Example, with R

When you have just 2 values, or a symmetrically distributed set, their mean is halfway between their extremes.
Thus the mean of these 6 numbers,
-10102   -201   -3   3   201   10102,
is 0.  

Or you could find the mean with 

Notice, that R-code does not assume values are distributed perfectly symmetrically.

Definition and Use

  1. The 'mean' is usually taken to mean the 'simple arithmetic mean' which is often defined as the sum of values, divided by the number of values (n) - or sum(y/n).
  2. The arithmetic mean may also be defined as the number that yields the least square difference to all the values - but it is seldom calculated that way.
  3. The mean is sometimes defined as sum(yf/n) or sum(yp). The latter may be preferred where there are infinitely many values but the number of possible values are restricted (discrete variables). Here y is a list of values and p is a list of proportions, where each proportion is the relative frequency of its corresponding value. If n is finite, p=f/n.
    Note, if y contains class-interval midpoints (rather than the individual values) the resulting estimate of the mean may be poor.
  4. The mean of a random sample will provide an unbiased estimate of the population mean. The accuracy of that estimate will depend on the sample size.
  5. Other means include detransformed means, weighted means, and trimmed means.
The mean is among the most heavily used and abused summary statistics. It is usually simple to calculate, seems easy to interpret, but ignores many things.

Simple formula

Assuming there are n numbers in y:

the sum of y/n

Or, if each value in y occurs f times, you 'weight' each value accordingly:

the mean of y is sum(y*f) / sum(f)

Tips and Notes

  • Simply because a mean can be calculated does not imply it is informative.
    For instance, if we arbitrarily code dead as 0, and alive as 1, and obese as 2, then taking the mean of 0,1,2,0,2,1,0 tells us little of use.
  • The mean of highly-skewed data may be rather misleading
    For example, if we observe 110013 leaves have no spiders, 23 leaves with 2 spiders, 1 with 2.5 spiders, and 1 leaf with 3024 spiders, their mean may be neither the best nor most reliable measure of spider abundance.
  • Means are vulnerable to the occasional highly-unusual value, and tell you nothing of how the results may vary.

Test yourself

You may like to think about these too.

Useful references

Rozsa, L. et al. (2000). Quantifying parasites in samples of hosts. Journal of Parasitology 86 (2) 228-232. Full Text 
One of the better contributions to how to quantify animal parasite numbers.

Stark, P.B. Measures of location and spread. Full text 
A thought-provoking account of measures of location and spread.

Waldinger, M.D. et al. (2007) Geometric mean IELT and premature ejaculation: Appropriate statistics to avoid overestimation of treatment efficacy. Journal of Sexual Medicine 5 (2), 492 - 499.Abstract 
Argues that the geometric mean of the intravaginal ejaculation latency time is required to assess the effect of treatments for premature ejaculation.

Wikipedia: Mean. Full text 
Covers the various types of means including arithmetic, geometric and harmonic means, weighted and trimmed means, and the sample and population mean.