Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
Fully replicated factorial ANOVA: Use & misuse
Statistics courses, especially for biologists, assume formulae = understanding and teach how to do statistics, but largely ignore what those procedures assume, and how their results mislead when those assumptions are unreasonable. The resulting misuse is, shall we say, predictable...
Use and MisuseWe define a factorial design as having fully replicated measures on two or more crossed factors. In a factorial design multiple independent effects are tested simultaneously. Each level of one factor is tested in combination with each level of the other(s) so the design is orthogonal. The analysis of variance aims to investigate both the independent and combined effect of each factor on the response variable. The combined effect is investigated by assessing whether there is a significant interaction between the factors. Factorial analysis of variance (ANOVA) is widely used in many disciplines, although less in the medical sciences than in others because (a) continuous response variables are relatively rare and (b) randomized trials commonly test only one treatment factor at a time (despite the fact that it would often make more sense to test two treatment factors together!).
The commonest misuse of factorial ANOVA derives from non-independent replications of each treatment combination, either because multiple evaluation units are nested within treatment combinations or because treatment combinations are not independently replicated. We give one veterinary example on fish farming where individual fish were (wrongly) treated as the experimental unit rather than the pond. In another veterinary example there were only 16 experimental units (tubes), but by subsampling from each tube the authors obtained 45 degrees of freedom. In addition the four treatment factors each at two levels were applied sequentially to groups of tubes rather than independently. In another veterinary example, bulk (pooled) milk samples were taken from each of four experimental group of goats, in effect providing only one replicate for each treatment combination. In an ecological study large herbivores were excluded from one plot and not excluded from another. Yet the authors generated numerous degrees of freedom for their factorial analysis of variance by subsampling within each of the two plots. In another ecological study one burnt area was compared with one unburnt area with subsampling in the two areas providing the (pseudo)replicates for the factorial ANOVA.
A second misuse involves either misinterpreting or failing to test for interaction effects. In one medical example on the effect of an exercise programme on knee pain an (inconvenient) interaction was dismissed as being not 'real', despite being highly significant and remarkably credible. In other cases interactions are not even tested for, an approach apparently justified by use of the term 'main effects model'. Interaction plots seem to be rarely used, despite being the best way to investigate significant interactions. There seems little awareness that the test for interaction has much lower power than the tests for main effects. This is well demonstrated by a veterinary example on effects of space allocation and ractopamine treatment on pig growth rates and an ecological example on the effects of temperature and carbon dioxide concentrations on tree growth. In both cases the main interest was in whether there was interaction between treatment factors, but in neither case was there sufficient power available to demonstrate it.
There is often no evidence that any of the assumptions of analysis of variance (in particular normality and homoscedasticity) were tested. This is especially important if ANOVA is employed on ordinal variables (such as pain scores) or other variables that are unlikely to be normally distributed such as survival times. Lastly it would be good if more authors gave estimates of the effect size for a particular treatment, either the difference (or the standardized difference) between means with the confidence interval.
What the statisticians sayFactorial analysis of variance using R is covered by Logan (2010) and Crawley (2007), (2005). Doncaster & Davey (2007) consider factorial ANOVA in Chapter 3. Hinkleman & Kempthorne (2008) and Bailey (2008) also consider the factorial design. Howell (2002) covers factorial ANOVA for the behavioural sciences. Further texts for ecological researchers include Quinn & Keough (2002) in the second part of Chapter 9, Underwood (1997) in Chapter 10 and Sokal & Rohlf (1995) in Chapter 11.
Montgomery (2003) and Green et al. (2002) look at the design and analysis of factorial randomized controlled clinical trials. Ottenbacher (1991) focuses on interpretation of interaction in the factorial analysis of variance design. Caudle & Williams (1993) note that analysis of variance should not be used to detect synergy in combination drug studies because assumptions for the analysis are generally not met.
Hector et al. (2010) provide an excellent update for ecologists doing factorial analysis of variance with unbalanced data. Langsrud (2003) argues that one should use Type II instead of Type III sums of squares for unbalanced data. Shaw & Mitchell-Olds (1993) put the case for Type III sums of squares whilst Stewart-Oaten, A. (1995) puts the alternate view of model selection. Lee & Nelder (2003) discuss the issue of false parsimony in linear models. Hines (1996) discusses the issue of pooling in ANOVA tables. Anderson & Ter braak (2003) describe the use of permutation tests for multi-factorial analysis of variance. Hewitt et al. (2001) discuss the use of nested two three factor analysis of variance to analyze data from BACI designs.
McKone & Lively (1993) argue that for one factor experiments (with replications on each factor) replicated at multiple sites better to use nested analysis of variance within each site rather than mixed model factorial ANOVA. This view is opposed by Greenwood (1994) who advocates the factorial analysis is preferable because it allows one to examine whether effects are different between sites. This is refuted by Lively & McKone (1995). Shen (1995) argues that it is not a factorial design as treatments are not identical and randomization is within site. Bennington & Thayne (1994) highlight the difficulties ecologists have in making the distinction between fixed and random effects. Wilk & Kempthorne (1955) note that the F-ratio for testing main effect A over AB interaction in mixed model is only approximately F-distributed.
Wikipedia provides sections on the analysis of variance, parsimony (Occam's razor) Julian Faraway has a short section on factorial analysis of variance using R. NIST/SEMATECH e-Handbook of Statistics gives details of the two-way ANOVA model and assumptions. The Handbook of biological statistics also covers two-way analysis of variance with replication.