Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

Search this site



Analytical surveys: Use & misuse

(cross-sectional studies, individual or group level, bias, odds ratio versus risk ratio)

Statistics courses, especially for biologists, assume formulae = understanding and teach how to do  statistics, but largely ignore what those procedures assume,  and how their results mislead when those assumptions are unreasonable. The resulting misuse is, shall we say, predictable...

Use and Misuse

Analytical and descriptive surveys have the same requirement for (unbiased) probability sampling, although in the literature such practice seems to be the exception rather than the rule. We found many examples of convenience,  haphazard and quota sampling in the literature, even in respected medical journals, despite the inevitability of selection bias. It is true that probability sampling cannot always exclude selection bias, because of response bias. But the likely extent of this can sometimes be quantified. Some (especially wildlife studies) adopt a census approach where efforts are made to sample every unit. However, some units are invariably missed, leading to selection bias. (To make matters worse, many workers use 'census' to describe a sample. )

Another major issue is measurement error, especially in the ubiquitous questionnaire studies that seem to have replaced (genuine) fieldwork in so many disciplines. Misclassification bias is inevitable if you ask farmers about notifiable diseases that they have not previously reported, even if they are able to accurately diagnose the disease. The moral of the story: a lot more ground-truthing is needed to make most questionnaire studies worth reading. Measurement error is also an issue for some explanatory variables, such as meteorological variables in ecological studies or wealth status in epidemiological research.

As regards confounding factors, they can usually be corrected for at the analysis stage - but that does of course assume that they have been measured in the first place. In some examples we found that individual sampling units were arbitrarily excluded on the basis of confounding factor (pretend the factor does not exist), rather than incorporating this factor in the analysis. Where the response variable is binary, the great majority of researchers still summarize the relationship using an odds ratio. This may change to the prevalence risk ratio as software becomes available to adjust such ratios for confounding factors. Whichever ratio is used, the choice of the reference category is important - choosing the smallest is unwise! An error which used to be very common is random selection of clusters (for example households or plots), but analysis of the association using individuals (an example of pseudoreplication). This is not quite so ubiquitous these days, but still occurs. Lastly we come to interpretation of results. An analytical survey has a very weak level of inference for causation - but this does not stop researchers trying to use them for this purpose. Conversely the lack of any significant association cannot prove there is no association - many studies have insufficient power to detect associations.


What the statisticians say

Kestenbaum (2009) has a useful chapter on analytical surveys. Grimes & Schultz (2002) give a brief account of medical analytical surveys using the terms cross-sectional studies (where sampling unit is the individual) and ecological correlational studies (where sampling unit is a group). Rothman & Greenland (1998) and Streiner & Norman (1998) also have short sections on cross-sectional designs. For veterinary epidemiologists, Thrusfield (2005) and Dohoo et al. (2003) cover analytical surveys as cross-sectional studies. Cochran (1968) makes the key distinction between an analytical survey (where the general population is surveyed) and an observational study (where contrasting groups are selected).

The issue of whether to use the odds ratio or the prevalence ratio as effect measure in a cross-sectional study has preoccupied statisticians for years! Pearce (2004) argues in favour of using the odds ratio (partly) on the basis that it estimates the incidence rate ratio with fewer assumptions than does the prevalence ratio. Osborn & Cattaruzza (1995) consider that either the odds ratio or risk ratio can be used, providing it is recognised that they are measuring different things. Barros & Hirakata (2003), Skov et al. (1998), Lee (1994) and Axelson et al. (1994) all argue in favour of the prevalence risk ratio rather than the prevalence odds ratio for a cross sectional study. Coutinho et al. (2008) and Thompson et al. (1998) evaluate different models for the multivariate estimation of the prevalence risk ratio. McDermott et al. (1994) gives details of how to correct statistical tests of association for cluster sampling. Wikipedia provides sections on cross-sectional study, ecological fallacy, hierarchy of evidence, information bias.