Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

Search this site



Case-control designs: Use & misuse

(cases incident or prevalent, fixed cohort or dynamic population, control selection, matching)

Statistics courses, especially for biologists, assume formulae = understanding and teach how to do  statistics, but largely ignore what those procedures assume,  and how their results mislead when those assumptions are unreasonable. The resulting misuse is, shall we say, predictable...

Use and Misuse

The case-control design is an observational design (not an experimental design) in which study groups are defined by the response variable rather than by the explanatory variable. The response variable is usually binary - that is an individual either has a particular condition (a case) or does not have that condition (a control). Having defined the two groups, the subsequent direction of the study is backwards in time. Past exposure to possible risk factors by cases is compared with that experienced by the controls. The aim is to find out which risk factors were most closely associated with an individual becoming a case.

Case control designs are heavily used in medical and veterinary epidemiological research, where the condition of interest is usually a disease. Wildlife biologists also use the same design (albeit seldom under that name) to study factors affecting site selection, whether for nesting, roosting or killing prey. The only difference is that usually current conditions are used as proxy variables for the conditions when the animal selected the site. Whilst wildlife biologists use the design without identifying it as such, it is not unusual for a medical researcher to describe a study as being a case control study when in fact it is not. Just because one has cases and controls does not make it a case control study! For that one must be looking back in time to ascertain why an individual became a case; if one is simply comparing various outcome variables for 'cases' and 'controls', then it is a cohort study. 

In recent years case-control designs in medical research have gained rather a bad reputation. This is partly because some results from randomized trials have not confirmed risk factors identified using case-control designs. Some would argue, however, that the poor reputation of case-control designs stems more from instances of poor conduct and overinterpretation of results than from any inherent weakness in the approach. Bias in selection of controls is common, and the term 'population based' is often (mis)applied to studies where convenience sampling  is used. Not enough studies use more than one type of control to guard against bias. There is sometimes selection bias as regards cases - if it cannot be assumed that all cases have been detected, then a random sample is essential.

Measurement error may be a major weakness resulting in misclassification of case status, for example in getting farmers to diagnose livestock diseases. There are also problems with measurement error of exposure, including only considering part of the exposure - for example only measuring the fluoride in drinking water rather than assessing all sources of fluoride. The best indication of a real relationship is a clear dose-response relationship over a range of exposures. If one is not using recorded information, then recall bias is nearly always present - but it is often not adequately assessed for its impact on the results. Expecting farmers to honestly report cases of a notifiable disease to a researcher - when they have not done so to government - is unlikely to provide reliable data. Last, but not least, the use of small sample sizes and vast numbers of potential risk factors will inevitably produce some 'significant' factors - but will not add much to our knowledge or understanding.


What the statisticians say

There are only a few texts solely devoted to the case-control design. Armenian (2009) provides a contemporary look at the case-control design. Schlesselman (1982) is now outdated but provides a good introduction to the classical case-control design. Rothman et al. (2008) provide a detailed account of the case-control design, with a strong emphasis on density case-control studies. Woodward (2004) covers case-control designs in Chapter 6, whilst Streiner & Norman (1998) provide a rapid overview of the design at an elementary level. Thrusfield (2005) compares several observational designs for veterinary research, including the case-control design. Ramsey et al. (1994) looks at ecological case control studies.

von Stralen et al. (2010) reviews the advantages and disadvantages of the case control design. Knol et al. (2008) is a key paper on how to interpret the odds ratio for different types of case-control designs. Biesheuvel et al. (2008) demonstrate the advantages of the nested case-control design in diagnostic research. Marshall (2004) comments on the extensive misuse of the term 'case control study' to describe what are prospective, follow-up studies. Petitti (2004) is essential reading on lessons learnt from the hormone replacement therapy fiasco! Pearce (2004) explains how the prevalence odds ratio obtained from a case control study using prevalent cases may estimate the incidence rate ratio given certain assumptions. Pearce (1993) and Rodriguez & Kirkwood (1990) both provide readable accounts of the different variants of the case-control design.

Kuehni et al. (2006) highlight the risks of information bias in case-control studies. Mezei & Kheifets (2006) warn of selection bias and its implications for case-control studies. Grimes & Schulz (2005) and Schulz & Grimes (2002) provide excellent advice on the choice of controls. Agudo & Gonzalez (1999) also discuss the problem of selecting controls. Kaplan et al. (1998) looks at possible biases of the selection of friends as controls. Palmer (1989) gives a lively account of the problems of selection and measurement bias in case-control studies. Bloom et al. (2007) and Garey (2004) look at the use and misuse of matching in epidemiologic studies. Marsh et al. (2002) provide an excellent example of the dangers of over-matching in a case control study on radiation dose response effects. Costanza (1995), Sorensen & Gillman (1995), Bland & Altman (1994), Karon & Kupper (1982) and Kupper (1981) discuss the pros and cons of matching in case-control designs.

Aarts et al. (2008) describes the use of logistic mixed effects models to analyze wildlife telemetry data and simulated observations under a case-control design. Boyce (2006) and Pearce & Boyce (2006) consider use of the case control design to evaluate resource selection functions. Keating & Cherry (2004) discuss the use of logistic regression to analyze wildlife studies employing case-control sampling designs. Rabinowitz et al. (1999) calls for a wider use of cohort and case-control designs in studies of wildlife as sentinels for human health hazards.

Wikipedia provides sections on observational studies, case-control studies, nested case-control studies and matching. Meirik outlines the principles of cohort and case-control studies.