Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
Pseudoreplication: Use & misuse
(simple, temporal and sacrificial pseudoreplication, independence of replicates, cluster randomized trials, pooling error terms)
Statistics courses, especially for biologists, assume formulae = understanding and teach how to do statistics, but largely ignore what those procedures assume, and how their results mislead when those assumptions are unreasonable. The resulting misuse is, shall we say, predictable...
Use and Misuse
Our heading 'use and misuse' is a bit of a misnomer here because pseudoreplication is always a misuse! Pseudoreplication has been defined as the use of inferential statistics to test for treatment effects with data from experiments or observational studies where either treatments are not replicated (though samples may be) or replicates are not statistically independent. Whilst exceptions can be found, this still provides the best definition. Epidemiologists recognise the same problem in terms of using the correct unit of analysis. Simple pseudoreplication is where there is only a single replicate per treatment, but subsamples are taken from each area. Temporal pseudoreplication is also where there is only a single replicate per treatment, but multiple samples are taken over time. Sacrificial pseudoreplication is where treatments have been genuinely replicated, but either data for replicates are pooled before analysis, or the analysis incorrectly treats subsamples or multiple samples as replicates.
The different types of pseudoreplication are still widespread in the literature despite some awareness of the problem. That awareness is greatest for (manipulative) experiments, and least when considering observational studies (sometimes termed mensurative experiments). Part of the problem is a tendency for journals to accept that you only have to discuss the issue in a paper (using terms such as a 'degree of pseudoreplication' or 'technically pseudoreplicated'), yet not actually do anything about it!
In observational studies simple pseudoreplication is common. It often results from taking multiple samples from each of two areas, and then using those multiple samples to test for some 'treatment' difference between the two areas.We give examples on looking at weight gain of animals where a different form of feed supplementation is given on each farm, and another where the density of dead trees is compared between burnt and unburnt areas. Temporal pseudoreplication is exemplified by a study on the effect of roof type on egg production by chickens. Sacrificial pseudoreplication is probably even more common and is especially unfortunate because data often could be analysed correctly - albeit with far fewer replicates than used in the authors' analyses. In one example (a comparison of public or private antenatal care providers) the study is analysed correctly for one variable (with providers as the unit of analysis) yet incorrectly for another variable (with patients as the unit of analysis).
In experimental studies cluster randomized trials provide a rich source of pseudoreplication errors, with sacrificial pseudoreplication again the most common. For example testing the effectiveness of a community-based education programme can only be done by randomly allocating the programme to communities. Yet we find that individuals - not communities - are used as the unit of analysis. The same issue arises in veterinary studies where treatment is randomly allocated to either a litter of young animals or to the mother, yet the analysis assumes treatment was allocated to the individual offspring. Allocating treatment at farm level but comparing bird clutch survival rates on a nest day basis is a fairly extreme case of pseudoreplication. Other practices, such as ignoring blocking or pairing in the design, may give quite erroneous results in the analysis. Pseudoreplication also rears its head in meta-analyses. Here the number of replicates is the number of studies, not the number of individuals, and just pooling all results quite simply gives the wrong answer!
What the statisticians sayCrawley (2007), (2005) gives a brief account of pseudoreplication. Scheiner & Gurevitch (2001) discuss many aspects of experimental design including pseudoreplication and unreplicated large scale experiments. Bart et al. (1998) provides a wide-ranging discussion of the topic in Chapter 6, although re-orientating the debate towards the issue of external validity seems to rather miss the point of Hurlbert's argument. Underwood (1997) discusses the problems of pseudoreplication and independence - although he is himself guilty of pseudoreplication when he recommends pooling error terms in a nested ANOVA (see also Hurlbert (1997) ). Mead (1988) provides one of the best accounts of the problem of pseudoreplication in Chapter 6, albeit without using the term.
Donner & Klar (2004) and Bland (2004) looks at the issue of selecting the right unit of analysis in cluster randomized trials. Bennett et al. (2002) looks at methods for the analysis of incidence rates in cluster randomized trials. He focuses on the use of the t-test with a log transformation if required. Altman & Bland (1997) look at pseudoreplication in medical research in relation to correctly identifying the unit of analysis.
Hurlbert (2009) returns to the topic of 'the ancient black art of pseudoreplication' in response to a vigorous attack on the concept by Schank & Koehnle (2009). Kozlov & Hurlbert (2006) respond to comments on an earlier paper by Kozlov (2003) on the high prevalence of pseudoreplication in Russian scientific publications. Millar & Anderson (2004) consider ways to avoid pseudoreplication in the analysis of fisheries data, whilst Bennett & Adams (2004) consider the same issues in forestry. Hurlbert & Meikle (2003) review the analysis of an experiment on control methodologies for the migratory locust and conclude that none of the paper's conclusions has any statistical support. Hurlbert (2004) responds to the views of Oksanen (2001) who criticizes Hurlbert's original paper on pseudoreplication. Rafaelli & Moller (2000) look at weaknesses of design of field experiments. Morrison & Morris (2000) and Ramirez et al. (2000) look at pseudoreplication in laboratory experiments.
Garcia-Berthou & Hurlbert (1999), Lombardi & Hurlbert (1996) and Hurlbert & White (1993) look at pseudoreplication and other statistical errors in marine biology. McArdle (1996) deals with pseudoreplication as two separate issues - the purpose of replication, and independence of the sampling units. Heffner et al. (1996) review the incidence of pseudoreplication in the literature. Stewart-Oaten et al. (1986) looks at the use of 'before and after control impact' (BACI) studies. Hurlbert (1984) produced the classic and highly influential paper on pseudoreplication.