Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)




The cohort design is a prospective or (less commonly) retrospective observational design in which the groups (cohorts) are determined by the level of the explanatory variable. Individuals within cohorts are followed up over time, usually to determine the incidence of the condition under study. The design is one of the most important observational designs in epidemiology, but it is also used in other disciplines (although not always under that name).

Formation of groups

    There are several ways in which cohorts can be formed. In the classical cohort study two or more groups of individuals are selected. All individuals are initially free of the condition under study (say a disease), but which group they are in is determined by the level of some risk factor (say level of pollution). Most commonly there are only two levels of the risk factor - absent (the control group) or present (the factor group). Ideally the individuals in the two groups should be similar in all other respects, although this is seldom the case in practice. All individuals should be at risk of developing the condition. The separate independent cohorts, defined by different levels of the risk factor, are then followed up over time.

    Every individual in a cohort may start at the same time and be followed up for a similar period of time. This is known as a fixed cohort. Alternatively individuals may be recruited to or leave the cohort at different times. Hence the set of individuals at risk changes over time for reasons other than the condition under study. This is known as a dynamic cohort.

    There are many variants on the classical design. Sometimes there is no 'control' group (risk factor absent), although it is debatable whether such a design should be called a cohort study - it is better described as a descriptive study over time. Sometimes the general population is used as the control group and results from the factor cohort are compared with routine population statistics. More commonly initially only one cohort is established comprising individuals who all share something in common (say people in the same profession, or insects at the same developmental stage). They are then classified into subgroups by some risk factor (say whether they smoke or not), and the incidence of the condition is assessed prospectively for each subgroup.

Prospective or retrospective

    Most cohort studies are prospective cohort studies - in other words the cohort are followed forward over time to record incidence of the condition. For example, an investigator wishing to study the subsequent adverse effects of modern warfare might select a cohort of army personnel exposed to depleted uranium in a conflict situation, a cohort of similar personnel not exposed to that element, and follow up both groups to record subsequent disease incidence.

    The alternative is the retrospective cohort study Twenty years post-conflict an investigator may use army records to identify personnel who were and were not exposed, and then follow up the medical history of those individuals to the present day. Sometimes the two approaches are combined - cohorts are identified retrospectively, disease incidence up to the present time is obtained from records, and then groups are followed up prospectively for a further period. Whichever the case, individuals are tracked forward in time from exposure (to the explanatory variable) to outcome.

To match or not to match

    The individuals in the unexposed (comparison) group may be matched by some characteristic (for example age) to those in the exposed group. This will prevent confounding by the matched characteristic at the outset of the study. Despite this advantage, matched cohort studies are relatively rare, mainly because individual matching of subjects is a very time-consuming process. Also although exposed and comparison groups can be matched at the outset of the study, this balance may not extend to the data available for analysis at the end of the study. Hence the matching characteristic must still be controlled for in the analysis.



Pros and cons of cohort studies


    • The incidence of the condition can be determined, whether the cumulative incidence or incidence rate.
    • Because all individuals are free of the condition at the start of the study, we can usually be certain that exposure to the explanatory variable precedes the individual getting the condition. Unlike the case-control study the temporal sequence is clear and unambiguous.
    • They can sometimes be used for studying rare conditions by selecting cohorts (for example mineworkers) with an unusually high proportion of individuals exposed to the risk factors.
    • Several response variables can be measured. This is because you are defining your groups with reference to the explanatory variable rather than the response variable(s).
    • There is no risk of incidence-prevalence bias.


    • Selection bias is nearly always present, with a high risk of confounding. For example if you compare groups that do and do not smoke, those who smoke may differ in all sorts of other ways from those who do not (for example they may be exposed to higher stress levels, or they may drink more alcohol, or be less likely to exercise, or be less educated, or older).
    • They are often unsuitable for studying rare conditions. For example setting up a cohort for studying CJD would be doomed to failure even if you restricted it to hamburger eaters (you would probably get no cases); similarly a longitudinal study of all possible nest sites for a bird species would be very unproductive as few if any sites would be utilized. Such studies are better done using case-control designs.
    • Exposure status can change over time, but at least you can monitor it in much more detail than with other designs.
    • Adequate follow-up can be difficult to sustain, leading to high drop-out rates.
    • Retrospective follow-up is only possible if records are available - so this design is mostly used for medical studies. Misclassification bias is a serious threat to validity for retrospective cohort studies
    • The required duration of the study may be very long, depending on the period of follow up. The sample size also has to be much larger than for other designs, especially for rare conditions. These two features mean that cohort studies can be very expensive to run.

topics :

What is survival analysis?