Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
If the response variable is binary (for example infected /uninfected) a clear definition of the condition under study is essential, since otherwise there is a high risk of selection bias. This is because all the individuals in the cohorts should be at risk at developing the condition. For example cohorts of cattle for the study of mastitis should only include female animals above a certain age. This process of restricting inclusion in a study to those with specific characteristics can be taken further as a specific strategy to reduce the risk of selection bias. Restriction should not, however, be taken too far as the number of individuals available for the cohort will be decreased, and the generalizability of the findings reduced. If the response variable is a measurement variable (for example milk yield or faecal egg count), then every effort should be made to minimize measurement error.
The next step is to define the levels of the explanatory variable. Commonly there are only two levels - exposed and non exposed - so the non-exposed group can be termed controls. More often the groups differ by degree - for example not many people consume absolutely no alcohol, so comparing cohorts based on presence/absence would be unproductive. Instead one may compare individuals consuming no alcohol with light, moderate and heavy drinkers. It is important to minimize measurement error for all explanatory variables. In addition for self-reported variables, such as consumption of alcohol or cigarettes, some form of validation of reported levels is highly desirable.
Individuals are then selected that are free of the condition under study but differ in the level of the explanatory variable. They should be similar in all other respects. The control (unexposed) group is best selected from internal sources - for example if one is looking at disease incidence in a cohort of soldiers exposed to (say) fumes from burning oil wells, then the best control would be soldiers from the same unit, but not exposed to fumes. As one moves to external sources - say soldiers back home, or (worse) civilians back home, the risks of selection bias and confounding grows. Worse still is to use the wider population rates of the condition as controls, instead of having a separate group. In that situation there is no way one can argue that the controls are similar to the exposed groups in all respects other than the exposure.
Monitoring over time
Individuals must be followed-up over time for as long as possible - ideally until their final demise. The effects of many environmental exposures may not show up for many years (for example asbestos). Showing no effect after a short time can be very misleading. The key explanatory variable should be monitored in case an individual changes his characteristics (for example stopping smoking). In addition information should be gathered for each individual on all potential confounding factors so that these can be controlled for at the analysis stage.
Methods for analyzing survival data allow one to deal with censored data. Censoring occurs when incomplete information is available about the survival time of some individuals. This may occur because the individual has been lost to follow up (because the individual has left the area or has died from a cause unrelated to the factor of interest) or because the individual has survived beyond the duration of the study. Both of these are described as right censoring because the 'event' happens at some undefined period after the last observation. Note that for individuals lost to follow-up most analytical methods assume that such observations are 'missing at random' (which may well not be the case). Hence every effort should be made to minimize losses to follow-up.
Follow-up can be especially difficult in ecological and wildlife cohort studies where one is dependent either on radio telemetry or monitoring (supposedly immobile) marked individuals. In either case disappearance may be wrongly taken to indicate death, when instead the animal has simply have left the study area. The same applies in radio telemetry studies when the radio transmitter ceases to function. We looked at the problem of missing observations in telemetry studies in two examples in
The response variable must be measured the same way in each cohort. Whoever assesses the response variable should be blinded to the exposure status of the individual under assessment, especially where that assessment is to some extent subjective. This is one drawback of retrospective cohorts since one has no control over that process.
Since there has been no random allocation, it should first be demonstrated that exposed and unexposed groups were indeed similar at baseline in all important characteristics (in other words possible confounding factors). For measurement variables the difference between means of two groups can be tested using the two-sample t-test or the Wilcoxon-Mann-Whitney
For an ordinal or measurement response variable, comparisons at outcome time can be made using the same tests as above. If comparisons over multiple time periods are required, then one can either use a t-test with an appropriate summary measure or a repeated measures analysis of
For a binary response variable, the cumulative incidence risk ratio and the attributable risk
If one is considering mortality, there is a problem with using the risk ratio as an effect measure. Because every individual must die eventually, the ratio must tend to unity over time. An alternative approach is survival analysis. Survivorship can be displayed by plotting cumulative survival at equal time intervals to give survivorship