Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
(analysis of variance, independence of replicates, pseudoreplication, homogeneity of variances, normal distribution of errors, Levene's test)
Statistics courses, especially for biologists, assume formulae = understanding and teach how to do statistics, but largely ignore what those procedures assume, and how their results mislead when those assumptions are unreasonable. The resulting misuse is, shall we say, predictable...
Use and Misuse
Analysis of variance is widely used across many disciplines, although more so by applied ecologists (fisheries, crop protection etc) than by medical researchers. Here we only cover one-way fixed effects parametric ANOVA, with the emphasis on whether the assumptions are met. Use of random effects ANOVA, multiple comparison tests and the Kruskal-Wallis test are considered elsewhere. We include both observational and experimental studies in the examples, but note that the strength of inference that can be drawn from the two types is quite different.
As with the t-test for comparing two means, the use of convenience sampling or non-random allocation limits any inference from an analysis of variance to the results at-hand. Whilst on occasions this can be sufficient, researchers usually wish their results to have broader applicability! Non-independence of replicates (pseudoreplication ) is still extremely common, especially in ecological work. For example, to show differences in catch or size of lobsters related to fishing intensity one would need replicated areas with different fishing levels. If one just wishes to compare two areas, and the lobsters are sampled using cluster sampling (traps), then the unit of analysis should be the trap - not the individual lobster. Repeated observations over time cannot be used as replicates as they are not independent.
A further form of pseudoreplication is to use a one-way ANOVA for paired comparisons. We include several examples of this, ranging from comparing the same horses before and after they are suffering from heaves, and comparing gas concentrations in badger setts before and after they have been blocked. Considerable power may be lost by analyzing such data with a simple one-way ANOVA. The same applies if results from more complex designs, such as a Latin square design, are analyzed with one-way ANOVA. The analysis is only appropriate for a completely randomized design.
The commonly quoted assumptions of ANOVA are homogeneity of variances and a normal distribution of errors. Ordinal variables seldom meet these assumptions. Levene's test is often used to test homogeneity, which is often (but not always) an adequate test. More often no test at all is carried out, even when figures are presented showing that variances are quite clearly different. Indeed, sometimes the change in dispersion resulting from a treatment is much more marked than the change in location - yet this is commonly ignored. Normality of distributions is seldom assessed, a practice sometimes justified on the basis that ANOVA is 'robust to non-normality'. When it is tested for, it is usually not specified what is tested - whether pooled observations (incorrect), pooled errors (sometimes the only possibility) or distribution of observations in each group (best). There is also a tendency to use tests rather than a more appropriate visual technique (such as QQ plots). Transformations are commonly and (often) correctly used to ensure that assumptions are met - but if this done then detransformed means should be reported.
Another common problem is to use excessively small group sizes, such that there is inadequate power to reveal any but the largest treatment effects. This often results from the researcher trying to include too many treatment levels - it is better to have rather fewer treatments, if this enables one to increase the level of replication. Lastly we note that in some observational studies it is pointless to carry out an ANOVA if the groups are themselves defined, in part, by the response variable.
What the statisticians sayUnderwood (1997) introduces one-way ANOVA in Chapter 7 along with an extensive discussion of the assumptions - especially that of independence of errors. Crawley (2005) provides a detailed account of how to do ANOVA using R in Chapter 9. Another recent text is Doncaster & Davey (2007). Sokal & Rohlf (1995), Zar (1999), Steel & Torrie (1960) and Winer et al. (1991) provide detailed accounts of ANOVA for a variety of experimental designs. The latter is especially good for tests of homogeneity of variance. Conover (1999), Sprent (1998) and Siegel (1956) cover the Kruskal-Wallis non-parametric ANOVA.
Bewick et al. (2004), Altman & Bland (1996) and Wallenstein et al. (1980) give an introduction to the use of ANOVA for medical researchers. Wilcox (1995) highlight the problems of conventional methods of analysis based on means whilst Keselman et al. (1998) focuses on the failure of researchers to check that assumptions of ANOVA are met. Wilson (2007) puts the case against inappropriate data transformations in order to meet normality assumptions. Welch (1951) presents his unequal variance one-way analysis of variance.
Wikipedia provides sections on the analysis of variance, one-way ANOVA, Levene's test, Hartley's Fmax test and Bartlett's test. Julian Faraway provides a useful guide to practical regression and ANOVA using R. NIST/SEMATECH e-Handbook of Statistics gives details of the one-way ANOVA model and assumptions, (1) (2) Levene's test and Bartlett's test. The Handbook of biological statistics introduces one-way ANOVA and homoscedasticity and Bartlett's test.