Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
Normal distribution: Use & misuse
(normal distribution assumption, tests of normality, graphical methods)
Statistics courses, especially for biologists, assume formulae = understanding and teach how to do statistics, but largely ignore what those procedures assume, and how their results mislead when those assumptions are unreasonable. The resulting misuse is, shall we say, predictable...
Use and MisuseThe normal distribution is often used in the literature in a purely descriptive way to describe the distribution of a set of data, and we give several examples of this. The commonest misuse here is to assume that somehow the data must approximate to a normal distribution, when in fact non-normality is much more common. For example, if length is normally distributed, and weight is related to it by an allometric equation, then weight cannot be normally distributed. Often terms like 'approximates to' or 'essentially normal' are used for distributions that are clearly nothing like normal. In addition, many (all?) distributions of real data are heterogeneous and are comprised of various discrete groups - with different means and standard deviations. Sometimes it can be more instructive to separate distributions into their component parts, than to argue for their normality.
The normal distribution underlies much of statistical theory, and many statistical tests require the errors, or the test statistic, represent a normal distribution. The test statistic's distribution cannot be assessed directly without resampling procedures, so the conventional approach has been to test the deviations from model predictions. For correlation coeffients this is equivalent to testing how the raw data are distributed, but this is not true for most other models - including regression and ANOVA. Unfortunately many authors assume precicely that, and test their data.
Another common mistake is to assume statistical tests of normality were tests for normality, and to interpret a 'significant' outcome accordingly. Owing to their limited power, tests of normality can be very misleading for small samples, and we give a few examples where authors have used more appropriate graphical methods to assess normality. Unfortunately though, we often find that insufficient detail of the methodology are given to enable a proper assessment of the results.
Perhaps because of the importance of the normal distribution in the historical development of statistics, there are some very strange ideas around of what a normal distribution should look like. We have included a couple of examples for your enjoyment...
What the statisticians sayArmitage & Berry (2002) give a rather brief coverage of probability distributions, including the normal distribution, for medical researchers in Chapter 3. Zar (1999) and Sokal & Rohlf (1995) each give conventional accounts of the normal distribution for biologists in Chapter 6. The latter is much better on graphical methods of testing for normality, and is one of the few texts to cover rankits. Snedecor & Cochran (1989) introduce the normal distribution in Chapter 2, with better coverage than other texts on the topics of skew and kurtosis. Thode (2002) is an advanced text covering all aspects of testing for normality.
Limpert et al. (2001) point out that log normal distribution often provides a better description of the distribution of measurements in biology than the normal distribution. This is because biological processes often act in a multiplicative rather than additive way. Altman & Bland (1995) give a useful introduction to the normal distribution for medical researchers. Potvin & Roff (1993) argue the case for non-normality being more prevalent in ecological data, and look at alternative non-parametric statistical methods. Micceri (1989) compares the prevalence of the normal distribution for psychometric measures to that of the unicorn and other improbable creatures.
Wikipedia provide sections on continuous probability distributions, the normal distribution, central limit theorem, log-normal distribution, skew, kurtosis, Chauvenet's criterion, quantile-quantile plots (including the normal quantile plot), P-P plots There is also a review and extensive discussion of the controversial book by Hernstein & Murray (1994) entitled 'The bell curve - intelligence and class structure in American life' . NIST/SEMATECH e-Handbook of Statistics gives details of the normal distribution and the lognormal distribution.