Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)



One-way random effects ANOVA: Use & misuse

(analysis of variance, intraclass correlation coefficient, repeatability, measurement error)

Statistics courses, especially for biologists, assume formulae = understanding and teach how to do  statistics, but largely ignore what those procedures assume,  and how their results mislead when those assumptions are unreasonable. The resulting misuse is, shall we say, predictable...

Use and Misuse

One-way random effects ANOVA is used less than one-way fixed effects ANOVA,  although it frequently forms a component of the analysis in mixed model ANOVAs. Its main use in one-way ANOVA is to estimate the intraclass correlation coefficient which is used as a measure of repeatability. Evolutionary ecologists and geneticists use repeatability to describe the proportion of variance in a character (usually behavioural) that occurs between rather than within individuals. Scientists in many disciplines use repeatability as a (relative) measure of measurement error. Lastly it is used to estimate the intracluster (=intraclass) correlation coefficient for surveys, both to indicate by how much sample size needs to be increased over that required for simple random sampling, and to correct Pearson's chi square tests carried out on frequency data derived from cluster sampling.

Given its limited use, and the fact that initial calculations are identical, it is perhaps not surprising that there is some confusion over when to use random effects ANOVA. We give one example of where the random effects model was used simply on the grounds that there were many groups (in that case countries), rather than on the grounds that selection of the chosen countries was (more or less) random. In another example the added variance component and repeatability were estimated as part of a fixed effects ANOVA - this has the inevitable result of giving a very high repeatability. Many authors do not seem be aware of the need to select groups (usually individuals in the case of measurement error) at random. Whilst a (genuine) random sample might not be possible, it would at least help if authors indicate that every effort was been made to avoid bias in selection of individuals.

If the ANOVA is only being carried out to estimate repeatability, then the normal errors assumption of ANOVA does not have to be met. Indeed ANOVA can be used on binary data to obtain the repeatability estimate - although if one is to estimate the confidence interval for the coefficient in the usual way, normal errors are assumed. However, the assumption of homogeneity of variances does still have to be met. We find a number of examples where this was not checked, and some cases where variances were clearly not homogeneous. Historically there have been many problems in actually estimating repeatability. Researchers have used MSbetween groups to estimate it rather than the added variance component. If one wants a measure of repeatability for a single reading per individual, this greatly inflates the estimate. To minimize such errors in the literature, it has been proposed that authors include the F-ratio and degrees of freedom in their paper so that values can be checked.

Interpreting the estimate of repeatability is also a fairly fraught area. The essentially relative nature of repeatability is seldom appreciated. For example, when assessing measurement error its value depends on both the variability between sampling units and the variability between repeated readings on the same sampling unit - clearly if a very variable group of sampling units are selected then repeatability of any measurement will be higher than if a homogeneous group are chosen. Nevertheless it does have value for the particular situation chosen, and the aim should be to maximize repeatability - just demonstrating 'significant repeatability' is not useful. We give examples where reproducibility is confused with validity.  Any measurement may be reproducibly wrong! Lastly if one is looking at behavioural repeatability, one does also need to check on measurement error!


What the statisticians say

Underwood (1997) introduces random effects ANOVA for the ecologist in Chapter 8. Other texts include Doncaster & Davey (2007), Crawley (2005), and Sokal & Rohlf (1995). Krebs (1999) provides a useful summary for ecologists of how to calculate the intraclass correlation coefficient along with its confidence interval in Chapter 15.

Bland & Altman (1996) provide an introduction to the intraclass correlation coefficient as a measure of repeatability in the medical sciences. A more in depth approach is taken by Shrout & Fleiss (1979), Müller & Büttner (1994) and McGraw & Wong (1996). Lessells & Boag (1987) is the key paper on the intraclass correlation coefficient for ecologists. Adolph & Hardin (2007) note that the Pearson correlation coefficient between two measures can be corrected for bias in the presence of measurement error using the intraclass correlation coefficient of each measure. Kerry & Bland (1998) and Killip et al. (2004) describe the intraclass correlation coefficient between clusters, also known as the intracluster correlation coefficient.

The Handbook of biological statistics covers Model I versus Model II ANOVA and partitioning variance components. Paul Barrett provides a useful summary of methods for assessing the reliability of rating data. David Howell & Robert Yaffee (1998) give information on estimating interclass correlation coefficients using SPSS. Wikipedia gives information on random effects ANOVA in analysis of variance and random effects models.