Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
Measures of location: Use and misuse
(arithmetic mean, geometric mean, harmonic mean, weighted mean, median, mode)
Statistics courses, especially for biologists, assume formulae = understanding and teach how to do statistics, but largely ignore what those procedures assume, and how their results mislead when those assumptions are unreasonable. The resulting misuse is, shall we say, predictable...
Use and Misuse
The simple arithmetic mean is by far the most used measure of location - or measure of central tendency - of variables, irrespective of the scale of measurement or the shape of the distribution. For ordinal variables, this results in widespread misuse of the arithmetic mean as a measure of location - simply because, for such a variable, we usually cannot say that the difference between score 1 and 2 is equivalent to the difference between score 2 and 3. Despite this, we found mean scores used in all fields of applied biology, for example body condition scores in veterinary applications, and infestation scales in agricultural entomology. Some (but not all) statisticians emphasize that this should never be done, and that instead the median should be used. One consequence of the overuse of arithmetic means is they are quoted even when the statistical test is actually comparing medians. Visual analogue scales are an especial problem - they are almost invariably analysed using the arithmetic mean, yet they are still ordinal variables.
In the analysis of measurement variables, the simple arithmetic mean is over-used, whilst other means, such as the geometric mean or a weighted mean, are under-used. Authors rarely display the distribution of their data (and journals even less rarely print such figures), so one can only guess which would be the better choice. If one is dealing with overdispersed data, then the geometric mean will often provide the most appropriate measure of location. However, the geometric mean should not be used when the mean is being used as a proxy for the total, as is often the case with cost data. In this situation it is often better to do two analyses - one using the geometric mean, and the other using the arithmetic mean or total. Sometimes the geometric mean is not appropriate (if for example distributions differ markedly), and the median must be used instead. But the median is often used inadvisably, such as when the sample size is small, which leads to an unnecessary loss of information.
What the statisticians sayArmitage & Berry (2002) summarize measures of location for medical researchers in Chapter 2. Sokal & Rohlf (1995) and Zar (1999) introduce measures of location for biologists in Chapters 4 and 3 respectively. Sokal & Rohlf stress the need for use of the appropriate scale of measurement whether linear or logarithmic. Stuart & Ord (1991) give a definitive, if somewhat mathematical, account of the various measures of location. Huff (1954) gives the topic a lighter treatment emphasising the misuse of the arithmetic mean for heavily skewed data. Although very old, this book is still worth looking at!
Barber & Thompson (1998) look at design and analysis of cost data in randomized controlled trials. Where total cost is required, the arithmetic mean must be used even if the distribution is skewed. Bland & Altman (1996) give a brief introduction to the geometric mean for medical researchers focusing on its use for skewed data. Tallarida et al. (1988) makes the somewhat surprising claim that the geometric mean is more widely used than the arithmetic mean, at least in pharmacological circles.
Veterinarians have to cope with a wonderful set of contradictory advice from Coles et al. (1992) , Wood et al. (1995) and Smothers et al. (1999) on whether to use the geometric or arithmetic mean for quantifying parasite numbers. Recent contributions include Rozsa et al. (2000) , Neuhauser & Poulin (2004) , Montresor (2007) , Waldinger et al. (2007) and most recently Dobson et al. (2009).
Wikipedia (2008) provides sections on the arithmetic mean , the geometric mean the harmonic mean , weighted means , the median, , the mode and running means . Stark of the University of California provides a comprehensive and thought-provoking account of measures of location and spread. NIST/SEMATECH e-Handbook of Statistics examines several measures of location including robust alternatives. Stockcharts.com provide a excellent review of the use of moving averages in charts for stock markets.