Statistics courses, especially for biologists, assume formulae = understanding and teach how to do
statistics, but largely ignore what those procedures assume,
and how their results mislead when those assumptions are unreasonable. The resulting misuse is, shall we say, predictable...
Use and Misuse
Medical epidemiologists lump together multiple group designs and multiple time period designs as 'ecologic studies'. These two comparative group designs do have one important feature in common - namely that the main explanatory variables (and sometimes the response variable) are measured at the group rather than individual level.
In a multiple group design, data are usually collected at one point in time from several groups or 'populations'. The groups are usually defined by the area in which they live and are selected to represent different levels of the explanatory variable. Some ecologists use the term mensurative experiment for an observational multiple group design.
In a multiple time period design, or before-and-after design (or time-trend design), data are collected from just one group or population. The different time periods are selected to represent different levels of the explanatory variable. Ecologists sometimes use the term 'quasi-experiment' for an observational multiple time period study where there is manipulation but no random allocation or replication.
Both of these observational designs are widely used. They are held in low regard by medical epidemiologists because of the 'ecological fallacy', but are still common in environmental health studies. They have a better reputation for veterinary studies where risk factors for many diseases are studied at the herd or farm level rather than at the individual level. Ecologists commonly use these designs with aggregate outcome measures such as density or species diversity where the ecological fallacy no longer applies.
With multiple group designs there is often a problem with recording of outcome measures. Although records may be available for the incidence of notifiable diseases at the population level, it is often not at all clear that all incidences are being recorded. Where samples have to be taken, the use of non-probability sampling is commonplace and invariably biased. As for the explanatory variables, they are often quite unnecessarily reduced to binary variables resulting in a loss of information and power.
Studies would nearly always be improved by the use of multiple categories e.g. four different levels of pollution. Another problem is pseudoreplication. In terms of the explanatory variable what matters is the number of replicate groups (clusters) - not the number of samples taken in each area. Pseudoreplication
is especially common in wildlife studies and we give
examples of single replicate comparisons of exploited/ unexploited areas (for crabs) and protected / unprotected areas (for small mammals). Even where replicated groups are used, they are rarely (if ever) selected randomly leaving open the risk of researcher bias.
With multiple time period designs the commonest problem is lack of any replication at all. Just comparing the outcome variable in one time period before an intervention with the value in one time period after the intervention is really not a viable design when there is pronounced seasonal change, not to mention a host of other confounding factors. We give some examples of where the basic design has been improved by the use of single or multiple control areas (although seldom selected randomly) to give the so-called BACI design. We also give examples of multiple sampling periods before and after intervention, and multiple levels of intervention where trapping intensity of an insect pest was varied over time.
Both designs suffer from many confounding factors yet often such factors are not even considered - for example in before-and-after studies when other quite separate interventions are taking place. Such confounding factors should be taken account of either in design (by using matching) or allowed for in the analysis. Lastly there is a tendency for medical epidemiologists to assume that ecologic studies are always the most biased when compared to other 'individual' designs such as case-control
and cohort designs.
This is not the case - it depends critically on whether all individuals within a group do indeed experience the same level of the explanatory variable.
What the statisticians say
Rosenbaum (2010),
(2002) 
(summarized in
Rosenbaum (2005) 
) is now the standard text on observational designs.
Shadish et al. (2002) 
and
Cook & Campbell (1979) 
cover before-after (quasi-experimental) designs.
Baker (2000) 
provides a useful review of study designs that can be used to evaluate the impact of development projects on poverty.
Rothman & Greenland (1998) 
give an overview of ecologic designs for medical epidemiologists, which is followed up in more depth by
Morgenstern (1998).
Rasmussen et al. (1991) 
looks at the use of time series analysis for unreplicated large-scale experiments.
Thrusfield (2005) 
and
Dohoo et al. (2003) 
look at ecologic designs for veterinary epidemiologists.
Morrison et al. (2008) 
covers multiple group and time period designs as mensurative experiments and quasi-experiments.
Manly (2008) 
looks at impact assessment in Chapter 6.
Krebs (1999) 
has a short section on multiple time period designs in Chapter 10 under environmental impact studies, but other observational designs are scarcely mentioned.
Elliott & Savitz (2008)
review the advantages of semi-ecologic designs in small-area studies of environment and health. Wakefield & Haneuse (2008)
consider how ecologic bias can be overcome using a two-phase study design. Vandenbroucke (2004)
looks at what needs to be done to make medical observational studies as credible as randomized trials. Björk & Strömberg (2002),
Webster (2002),
Greenland (2001),
and Brenner (1991)
look at the sources of bias in ecologic designs. Morgenstern (1995),
Morgenstern & Thomas (1993),
Walter (1991)
and Morgenstern (1982)
review the uses of ecologic analysis in epidemiologic research.
Fergusson et al. (2008),
Harris et al. (2006),
(2004)
and Speroff & O'Connor (2004)
cover the use and interpretation of quasi-experimental (before-after intervention) studies in public health interventions. Meyer (2006)
looks at natural and quasi-experiments in economics, whilst Heckman & Smith (1995)
conclude that the case for the experimental approach to social program evaluation is overstated. Gillings et al. (1981)
discusses use of interrupted time series design to evaluate impact of a public health intervention.
Ferraro & Pattanayak (2006)
make an impassioned plea for more meaningful evaluation of biodiversity conservation investments using quasi-experimental designs, following a similar call by Block et al. (2001)
. Hole (2005)
provides a critique of the methodology used in multiple group studies of the effect on biodiversity of organic versus conventional farming systems. Blackburn (2004)
argues that the scales of interest in macroecology are simply too large for the traditional ecological approach of experimental manipulation to be possible or ethical. Bennett & Adams (2004)
and Monserud (1995)
describe the deficiencies of study design in forestry. Mcgarigal & Cushman (2002)
compare observational and experimental approaches to the study of habitat fragmentation effects.
Murtaugh (2000),
(2002)
criticizes the methods used to analyze BACI designs which is responded to by Stewart-Oaten (2003).
Smith (1993),
(2002)
describes the various BACI designs and their analysis. McDonald et al. (2000)
examines the use of generalized linear models to analyze count data from BACI designs. Underwood (1992)
and Eberhardt & Thomas (1991)
further develop ideas on BACI designs proposing the use of multiple controls randomly chosen out of a set of possible sites and paired control/impact stations respectively. Stewart-Oaten et al. (1986)
propose the use of replicated sampling over time to improve the BACI design.
Wikipedia provides sections on observational studies,
the ecological fallacy
and matching.
Gene Glass
describes use of the interrupted time series quasi experiment in psychological research. Thomas Songer
describes the use of before-after studies in injury research whilst Joshua Angrist
looks at quasi-experiments versus randomized trials in educational research.