Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

Search this site



Analytical surveys

On this page: Characteristics  Pros & cons 


Analytical surveys (also termed cross-sectional studies) are carried out to investigate how a response variable (such as disease prevalence) is related to particular explanatory variable(s). There is sometimes a specific hypothesis being tested about which variables are likely to be the most important explanatory variables. Sampling units are selected from the target population using probability sampling or by census without regard to their status with respect to either the response or explanatory variables. Use of non-probability sampling (especially convenience sampling) is widespread, but findings cannot be extrapolated to the target population.

There is no selection of units to represent particular levels of the explanatory variable. The design is (usually) cross-sectional although the process of data gathering may last for an extended period.

    These characteristics mean that the analytical survey has a weak level of inference for causation - although if the study is being done to investigate a specific hypothesis one could argue that that the level of inference is somewhat stronger than in a descriptive study. However, the dividing line between a descriptive survey and an analytical survey is not well defined. One design tends to merge into the other depending on how much interest there is in investigating relationships with explanatory variables. Also the term 'analytical survey' is only used for cross-sectional surveys - whereas many data from longitudinal studies are often analyzed in great detail to investigate the effects of (for example) weather factors.

It is important to define whether the sampling is at the individual level or at the group level. Many researchers use the terms 'analytical survey' or 'cross-sectional survey' irrespective of whether the sampling unit is an individual, a group of individuals (say a herd or a farm), or a plot of land. We also use the term in this sense - in other words for all studies where sampling units, whether they be individuals or groups, are selected using probability sampling. However, medical researchers usually restrict the term to where the sampling unit is an individual, and assign all studies on groups the uninformative catch-all label of ecologic study or ecological correlational study. Since these terms are imprecise (and rather meaningless to ecologists) we shall avoid them as far as possible.

    If the sampling unit is a group, there are no problems with inferences if the response variable is a true group-condition (such as density or species richness), or will invariably affect all individuals (such as weather factors) . However, one should be very cautious about extrapolating associations shown for groups (say heart disease fatality rates are higher in poor areas) to individuals (do poorer individuals have a higher risk of dying from heart disease?). Such extrapolations may be completely unjustified - for example heart disease rates may be higher in poor areas simply because medical facilities are much worse. This is known as the ecological fallacy.

Exposure to the explanatory variable must be clearly defined. If the explanatory variable describes a permanent factor (such as sex of an animal or size of a pond), then current observations are valid. But if it describes a transient factor (such as whether the person smokes, or height of vegetation) then this information must be obtained from past records or (for humans) by questionnaires. This is especially so if the risk factor is slow-acting or its action is delayed - as is common among carcinogens.



Pros and cons of analytical surveys


    • They are relatively simple and inexpensive to implement
    • It may be possible to use routine records, and no follow up is required.
    • The proportions of the population currently with the condition and exposed to the risk factors can be determined.
    • Since all selected sampling units have to be classified or measured, there is no risk of excluding some units because they don't fit artificial criteria. This is a potential problem with observational studies.
    • Risk and odds ratios can be corrected at the analysis stage to take account of confounding factors - providing such factors are known about and have been measured.
    • No individual is exposed to the risk factor as a direct result of the study. However there is a danger that treatment may be delayed in order to avoid interference with the 'research'.


    • Causality can only be inferred if the risk factor is permanent, for example breed or gender. Interpretation is difficult or impossible for transient risk factors.
    • Because such studies are cross-sectional, there is a high risk of incidence-prevalence bias. For example in the case of disease, some individuals with the condition may have been lost from the population. Similarly some individuals may have been exposed to a risk factor which is no longer present.
    • The cross-sectional nature of the study can also lead to recall bias. The only way to obtain information on past experiences and events is to use a mix of recall (in medical research) and past records. The accuracy and extent of such information may differ between (for example) cases and non-cases leading to bias. Cases may have better recall of certain things if they believe it is linked to their disease.
    • For binary variables the condition of interest must be clearly defined to only cover a specific condition. This should be done by specifying inclusion and exclusion criteria. Failure to do this may result in misclassification bias. For measurement variables measurements must be made to a sufficient level of accuracy.
    • Confounding factors will not be equally distributed amongst groups. Whilst known confounding factors can be corrected for, unknown ones cannot.
    • The sample sizes you end up with each group cannot be set by the researcher, but will be determined by their relative frequency in the population. The researcher can only set the total sample size.