Statistics courses, especially for biologists, assume formulae = understanding and teach how to do
statistics, but largely ignore what those procedures assume,
and how their results mislead when those assumptions are unreasonable. The resulting misuse is, shall we say, predictable...
Use and Misuse
Descriptive surveys (as the name implies) are used to describe the world as it is. The survey may extend over a period of time (longitudinal survey or monitoring) and/or over space (distribution survey or mapping). In a descriptive study, usually no specific hypotheses are being tested about factors which may affect the variable of interest (the response variable). However, associations over time and space between the the response variable and one or more explanatory variables may be investigated post-hoc. These characteristics are what distinguish a descriptive survey from an analytical surveys
(where usually a random cross-sectional sample is taken to investigate a hypothesized association between response and explanatory variables) and an observational study (where specific groups are compared to test a hypothesis). Descriptive surveys tend to be the 'poor relative' of other designs because of the relatively low strength of inference possible when investigating causal relationships. Nevertheless, such surveys are still heavily used for exploratory investigations of causal relationships in all disciplines - despite the drawbacks. They are especially valuable in situations where a manipulative experiment is simply not possible, and where even an observational design would be difficult - for example the effect of climate and weather factors on the distribution of organisms. Moreover, a well designed descriptive survey has a high strength of inference when it comes to estimating population parameters. We focus on the latter topic when considering sampling methods, so here we concentrate on monitoring and distribution studies.
We look
at a number of long term monitoring studies, ranging from malaria incidence to catches of crop pests and bird numbers, Most weaknesses relate to the sampling techniques used to obtain the data. When obtained by passive surveillance, the number of cases of disease reported is often largely dependent on the reporting rate. Sometimes it is simply not feasible to use probability sampling - for example with BSE in cattle - but one must therefore be very cautious about apparent trends. In ecological studies various analytical methods have been developed to try to deal with the special problems of long term 'messy' data. However, more effort is probably needed on validation of population indices, and less on analysis of the data. It is also important to use the right measure of abundance - use of geometric mean
catch is often appropriate whilst maximum catch over a period is very questionable. In wildlife studies there is great interest in participatory monitoring - but it still has to give the right answers. Linear relationships should not be fitted to curvilinear trends, and trajectory plots should be used to investigate delayed density dependence.
For spatial distribution studies the availability of geographical information system (GIS) software means that producing maps is now more common. However, computer produced maps still need a scale, an indication of north, and a legend to indicate what the symbols mean. We give examples of the use of proportional circles for indicating abundance on a map, although these are rather poor for distinguishing classes. They can also be disastrous if lots of circles are close together since it is impossible to see to which area they refer. Isoplethic maps are preferable, but for these sampling stations have to be distributed fairly evenly over the area. As with monitoring studies, the value of a distribution study always depends on quality of data - if there are any 'weak points' they will tend to prejudice the whole exercise. With GIS maps & predictions one should always check accuracy of model by dividing data set into training and evaluation points, and also test the robustness of model.
One should not be surprised to find that probably the commonest misuse of descriptive studies over time and space is to infer causation when none exists. We look at examples ranging from whether pollution or war affect the proportion of male babies in a population, the (possible) link between schizophrenia and bites from Borrelia-infected ticks and (possible) effects of fog day frequency on malaria incidence with a 7 month lag. In each case we have to consider possible confounding factors and be very cautious in concluding that one factor causes another. We also look at examples where erroneous conclusions have resulted from looking at too short a time series, or where simple correlation analysis has been used which does not take account of serial correlation. The lack of any causal link is why predictions of the future based on correlations in the past are often less than accurate.
What the statisticians say
Human disease surveillance is covered by M'ikanatha et al. (eds) (2007)
Salman (2003)
looks at animal disease surveillance and survey systems.
Thrusfield (2005)
and Dohoo et al. (2003)
also cover animal disease surveillance both spatially and temporally.
Morrison et al. (2008),
Spellerberg (2005)
and Wiersma (2004)
cover ecological and environmental monitoring. Sutherland (2006),
Elzinga et al. (2001)
and Krebs (1999)
include sections on monitoring and mapping studies for the ecologist. Goldsmith (1991)
provides an earlier text on monitoring for conservation and ecology.
Delaney & Van Niel (2007)
and Bernhardsen (2002)
are good introductory texts on GIS, whilst Longley et al. (2005)
provides a more in-depth overview of the field. Burrough & McDonnel (1998)
present a strong theoretical basis for GIS which is often lacking in other texts. Pfeiffer et al. (2008)
and Hay et al. (2000)
look at the application of remote sensing and geographical information systems in medical and veterinary epidemiology.
Stärk et al. (2006)
look at risk-based surveillance in the field of veterinary medicine. Grimes & Schultz (2002)
provide a general account of what descriptive studies can and cannot do in medical research. As well as surveillance, they also look at case reports, case series reports and analytical surveys.
James et al. (2004)
looks at mapping mortality rates at the county level across the USA. Brody et al. (2000)
looks at how a myth has been built up around the identification of the source of the London cholera epidemic in 1854 - essential reading for anyone using a geographic information system today! Clarke (1996)
reviews the development and uses of GIS systems. Gesler (1986)
gives a review of the use of spatial analysis including use of a geographic information system in medical geography.
Kalluri et al. (2007)
reviews the status of remote sensing studies of arthropod vectorborne diseases over the last 25 years. Herbreteau et al. (2006)
notes that progress has been limited by use of cheap images and pre-processed data. Rogers et al. (2002)
reviews the use of satellite imagery in the study and forecast of malaria. Bergquist (2001)
and Beck et al. (2000)
review the use of remotely sensed data for monitoring, surveillance and risk mapping of vector-borne diseases. Hugh Jones (1991)
edits a collection of papers on application of remote sensing and geographic information systems in veterinary epidemiology and parasitology.
Teder et al. (2007)
and Buckland et al. (2005)
consider improved ways to monitor biodiversity in the light of comments by Yoccoz et al. (2001)
on the weaknesses of past programmes. Legga & Nagyb (2006)
explain why most conservation monitoring is (but need not be) a waste of time.
Danielsen et al. (2003),
(2000)
argue for the use of participatory monitoring in conservation ecology in developing countries. Pollock et al. (2002)
stress the importance of calibrating relative estimates of population size against absolute estimates.
Phillips et al. (2006)
describe maximum entropy modelling of species geographic distributions. Jessup (2003)
argues that data obtained from opportunistic research and non-probability sampling can be useful providing likely biases are taken into account. Kerr & Ostrovosky (2003)
look at how ecologists and conservation biologists are using remote sensing data whilst Dominy & Duncan (2001)
provide an excellent account of the practical application of GPS and GIS methods for wildlife conservation in the African rain forest.
Wikipedia provides sections on disease surveillance,
environmental monitoring,
cartography,
geographic information systems,
GIS and aquatic science,
remote sensing,
ground truthing
and global positioning system.
The US Geological Survey
has an excellent guide to GIS. The Collegial Centre for Educational Materials Development
provides a kit of instructional materials for learning how to use GIS software.