Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
Randomized experiments: Use & misuse
(manipulation, random allocation, independent replication, multiple treatment levels)
Statistics courses, especially for biologists, assume formulae = understanding and teach how to do statistics, but largely ignore what those procedures assume, and how their results mislead when those assumptions are unreasonable. The resulting misuse is, shall we say, predictable...
Use and MisuseIn an experimental study, the experimenter manipulates or controls the level of the explanatory variable(s). This is in contrast to observational studies where the level of the explanatory variable is either self-selected by the unit concerned or has been imposed haphazardly. In an experiment, the two (or more) levels of the explanatory variable(s) are randomly allocated as treatments usually to a number of independent experimental units. This is again in contrast to an observational study where there is no random allocation of treatments to the (sampling) units.
We have noted above that the replicates in an experiment should be independent. The principle of independent replication is extremely important and applies to both observational designs and randomized experiments. Unfortunately pseudoreplication, where replicates are insufficient or not independent, is common in all types of experiments. We found a medical example where the two treatment groups were composed non-randomly, and then one of the groups was assigned randomly to treatment. This approach gives only one replicate per treatment! In another trial individuals in buildings were treated as replicates despite the fact that they were emphatically not independent. We also saw it in veterinary trials where cows were allocated to treatment, but disease incidence was assessed in calves. Repeated measures Latin squares have special problems if all the replication is done over time. Given the prevalence of pseudoreplication, we devote an additional page to this topic, and concentrate here on the issues of randomization and stratification.
The term experiment is sometimes used where there is manipulation, but no random allocation. However, the element of random allocation is extremely important in biological experiments because (providing treatment group sizes are sufficiently large) it ensures that groups are balanced as regards potentially confounding variables. It also helps to eliminate bias (whether conscious or unconscious) in which treatment is allocated to which unit, and serves to validate the subsequent statistical comparison of the treatment groups. This is why we can have more confidence that a strong relationship demonstrated in an experiment indicates a causal link - in other words, there is a fairly strong inference for causality. We therefore reserve the term experiment for where we have manipulation, randomization and replication. The term quasi-experiment can be used for where there is manipulation, but no random allocation.
The process by which randomization is achieved should always be described, as should methods for concealment allocation. This is now generally done in human clinical trials, but very seldom in other studies. Although a coin toss is statistically correct, it is very risky as there is no advance list. In addition little consideration is given to concealment allocation which can lead to bias in treatment allocation. Many experiments would be improved by having both a positive and negative control group and by having multiple treatment levels which allow the demonstration of a dose response effect.
If only a small number of experimental units are available (for example, plots of land in agricultural trials), then it can no longer be assumed that treatment groups will be balanced as regards potentially confounding variables. In this situation it is important that there is adequate interspersion of treatments through the process of stratification - in other words that treatments are assigned randomly within particular strata or blocks of relatively homogeneous units. For clinical trials the question of whether to stratify or not becomes especially important in cluster randomized trials. Pairing is not uncommon - although it is often debatable whether it is justified, and there is the problem of contamination effects between units. In veterinary trials stratified randomization (by weight) tends to be the rule, although this stratification may then (wrongly) be ignored in the analysis.
Two further issues remain. Early termination of an experiment based on the preliminary results can result in a high level of bias. Sometimes this is unavoidable on ethical grounds as in circumcision trials. But in other instances, such as the trial to assess the impact of badger culling on bovine tuberculosis, the early termination was very suspect. Then there is the question of external validity. Experimental units are never representative of the source population, so it is foolhardy to set policy based on the results of just one experiment.
What the statisticians sayBailey (2008) provides an excellent multidisciplinary guide to experimental design. Another recent text is Hinkleman & Kempthorne (2008). Jones & Kenward (2003) provide a detailed account of the design and analysis of crossover trials. Classic general texts on design include Cochran & Cox (1992), Winer (1991) and Fisher (1935).
Armitage & Berry (2002) cover experimental design for medical researchers in Chapter 9, with a special chapter on clinical trials in Chapter 18. Donner & Klar (2000) provide a unified treatment of cluster randomization trials. Meinert (1986) is often regarded as the definitive text on design and analysis of clinical trials. Pocock (1983) also covers the design and implementation of clinical trials. Thrusfield (2005) looks at the design and conduct of randomized controlled veterinary trials in Chapter 16.
Gotelli & Ellison (2004) , Scheiner & Gurevitch (2001) , Krebs (1999) and Underwood (1997) provide extensive coverage of the design and analysis of ecological experiments. Mead (1988) and Petersen (1985) are two older texts that have stood the test of time and provide excellent accounts of 'conventional' experimental design for ecologists and agriculturalists.
Vandenbroucke (2008) gives a fascinating account of how those involved in observational research and randomized trials have completely different mindsets about research. Lathyris et al. (2007) look at how the evidence from crossover trials is handled in meta-analyses of randomized trials. Green (2002) provides a general review of the design of randomized trials. Benson & Hartz (2000) compare the treatment effects estimated by observational studies and randomized, controlled trials - contrary to 'conventional wisdom', there was no evidence that observational studies overestimated the effect size.
Eldridge et al. (2006), Murray (2004), Donner & Klar (2004), Chuang (2002), Klar & Donner (1997), Bland & Kerry (1997) and Martin et al. (1993) consider issues of sample size, matching and stratification in cluster randomized trials. Moerbeek et al. (2003) compares analytical methods for the analysis of multicenter and cluster randomized intervention studies. Kang et al. (2008), Altman & Bland (2005), and Altman & Bland (1999) cover randomization procedures including minimization. Sackett (2007) cautions that we should pay more attention to bias-generating consequences whatever their cause rather than focusing too much on blindness. Hróbjartsson et al. (2007), Boutron et al. (2007), Hewitt et al. (2005), Lewis & Warlow (2004), Schulz & Grimes (2002a), (2002b) Altman & Schulz (2001) and Day & Altman (2000) focus on allocation concealment and blinding in medical clinical trials.
Baigent (1997) emphasized the need for large-scale randomized evidence from clinical trials. Some older key papers on the design of randomized trials include Simon (1991) who describes recent progress in statistical methodology for clinical trials and Simon (1979) who provides an excellent review of restricted randomization designs in clinical trials. Meier (1975) shows how developments in the fields of polio, coronary surgery, diabetes and breast cancer fully justify the need for randomized clinical trials.
Kilkenny et al. (2009) review the quality of experimental design, statistical analysis and reporting of research using animals. Araujo (2008) looks at design aspects of aquaculture experiments. Festing (2003) stresses the need for better experimental design in animal experiments. Johnson & Besselsen (2002) looks at practical aspects of experimental design in animal research. Elbers & Schukken (1995) look at design issues in veterinary clinical trials including randomization, choice of experimental unit, blocking and sample size calculations. Wood et al. (1995) discuss designs for evaluating the efficacy of anthelmintics in ruminants.
For the design of ecological experiments, Cottingham (2005) argues in favour of experiments with many treatment levels and few replicates per level analyzed with regression. Inouye (2005) also promotes the use of regression designs. Deutschman (2001) and Allison (1999) discuss experimental design in relation to biodiversity experiments. Rafaelli & Molle (2000) look at weaknesses in the design of field experiments. Czitrom (1999) stresses the advantages of factorial experiments over single factor designs. Levine et al. (2008), Vanclay (2006) Inouye (2001) and Gibson et al. (1999) look at various experimental designs for examining the effects of interspecific competition. Hurlbert (1984) is a classic on the design of ecological field experiments.
Hewitt et al. (2007) point out that assigning causality is not the exclusive domain of manipulative experiments. Jewett (2005) argues for strong-inference-plus where an exploratory phase and a pilot phase precede the hypotheses-testing phase. O'Donohue & Buchanan (2001) criticize the concept on strong inference on a number of grounds. The greater strength of inference from manipulative experiments is stressed by McArdle (1996). Quinn & Dunham (1983) caution against a rigid application of hypothetico-deductive methodology in ecology. Platt (1964) introduces the concept of strong inference in his classic Science paper.
Wikipedia provides sections on design of experiments, random assignment, randomized block design, factorial experiment, crossover design, fractional factorial design, and response surface methodology. NIST/SEMATECH e-Handbook of Statistics describes how to choose an experimental design. Sid Sytsma provides a quick guide to the basics of experimental design. Western Michigan University describe regression designs.