Example, with R
When you have just 2 values, or a symmetrically distributed set, their mean is halfway between their extremes.
- Thus the mean of these 6 numbers:
-10102 -201 -3 3 201 10102,
is 0.
Or you could find the mean with R
Notice, that R-code does not assume values are distributed perfectly symmetrically.
Definition and Use
- The 'mean' is usually taken to mean the 'simple arithmetic mean' - which is often defined as the sum of values, divided by the number of values (n) - or sum(y/n). Thus each value contributes an equal weight, 1/n, to their sum (of n values).
- The arithmetic mean may also be defined as the number that yields the least square difference to all the values - but a simple mean is normally only calculated that way in techniques such as regression analysis.
- The mean is sometimes defined as sum(yf/n) or sum(yp). The latter may be preferred where there are infinitely many values but the number of possible values are restricted (discrete variables). Here y is a list of values and p is a list of proportions, where each proportion is the relative frequency of its corresponding value. If n is finite, p=f/n.
- Note, if y contains class-interval midpoints (rather than the individual values) the resulting estimate of the mean may be poor.
- The mean of a random sample will provide an unbiased estimate of the population mean. The accuracy of that estimate will depend on the sample size.
- Other means include detransformed means, weighted means, and trimmed means.
The simple arithmetic mean is among the most heavily used and abused summary statistics. It is usually simple to calculate, seems easy to interpret, but ignores many things.
Simple formula
Assuming there are n numbers in y:
Or, if each value in y occurs f times, you 'weight' each value accordingly:
the mean of y is sum(y*f) / sum(f)
Tips and Notes
- Simply because a mean can be calculated does not imply it is informative.
- For instance, if we arbitrarily code dead as 0, and alive as 1, and obese as 2, then taking the mean of 0,1,2,0,2,1,0 tells us little of use.
- The mean of highly-skewed data may be rather misleading
- For example, if we observe 110013 leaves have no spiders, 23 leaves with 2 spiders, 1 with 2.5 spiders, and 1 leaf with 3024 spiders, their mean may be neither the best nor most reliable measure of spider abundance.
- Means are vulnerable to the occasional highly-unusual value, and tell you nothing of how the results may vary.
Test yourself
You may like to think about these too.
Useful references
- Rozsa, L. et al. (2000). Quantifying parasites in samples of hosts. Journal of Parasitology 86 (2) 228-232. Full Text
- One of the better contributions to how to quantify animal parasite numbers.
- Stark, P.B. Measures of location and spread.
Full text
- A thought-provoking account of measures of location and spread.
- Waldinger, M.D. et al. (2007)
Geometric mean IELT and premature ejaculation: Appropriate statistics to avoid overestimation of treatment efficacy. Journal of Sexual Medicine 5 (2), 492 - 499.Abstract
- Argues that the geometric mean of the intravaginal ejaculation latency time is required to assess the effect of treatments for premature ejaculation.
- Wikipedia: Mean.
Full text
- Covers the various types of means including arithmetic, geometric and harmonic means, weighted and trimmed means, and the sample and population mean.