Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
Types of variables: Use and misuse
(Scale of measurement, nominal, ordinal, interval, ratio, collapsing variables)
Statistics courses, especially for biologists, assume formulae = understanding and teach how to do statistics, but largely ignore what those procedures assume, and how their results mislead when those assumptions are unreasonable. The resulting misuse is, shall we say, predictable...
Use and Misuse
Use and MisuseDeciding on the best response variable to use is one of the most important steps in study design - but all too often it is merely an after-thought. Binary variables are generally over-used especially in medical research, with ordinal (and sometimes measurement) variables being collapsed unnecessarily to the binary scale. This results in a substantial loss of information. Collapsing a measurement variable (whether interval or ratio) to discrete classes for display as a histogram also results in an unnecessary loss of information. An extreme case of this is making precise counts up to a certain number, and then putting all higher counts in a 'more than' category. This effectively collapses the measurement variable to the ordinal scale.
Derived variables can be as difficult to deal with as ordinal variables. They tend to have odd distributions, and can be difficult to interpret. This is because change in a derived variable (such as case fatality or sex ratio) can result from increases or decreases in each of the two component variables. One can also get meaningless derived variables when the wrong total is used to work out a percentage. Great care should be taken when adding up ordinal scores to give a composite index, especially if individual components are not verified. Proxy variables are widely used for variables that are difficult to quantify or measure, but all too often the fact that one is using a (not very accurate) proxy measure is forgotten in the heat of the moment! Transformation of variables has its own pitfalls which we deal with elsewhere.
What the statisticians sayThrusfield (2005) gives a concise account of the various scales of measurement including the visual analogue scale from the veterinary viewpoint. Armitage & Berry (2002) , Woodward (2004) and Bowers (1996) gives similar accounts from the medical viewpoint. Bowers stresses that one should not use the arithmetic mean for ordinal variables. Sokal & Rohlf (1995) and Zar (1999) describe the various scales of measurement used by biologists. Sokal & Rohlf also give a lively (and rare) warning of the pitfalls in using derived variables. Siegel (1956) still provides one of the best accounts of scales of measurement. He makes an emphatic appeal not to use descriptive statistics like the mean, which are suited to measurement variables, on ordinal variables. Essential reading if you analyze score data!
Altman & Royston (2006) emphasize the cost of dichotomizing continuous variables. Yamamura & Nomoto (2003) look at how to construct a meaningful scale of infestation of a plant pest or disease focusing on whether to use an arithmetic or logarithmic scale and how wide the scale intervals should be. Bartfay & Donner (2000) illustrate the advantages of preserving multinomial data in their original scale rather than collapsing them to a binary scale. This is a rather mathematical paper, but the conclusions are of great value. Michell (2000) criticizes pyschometrics for quantifying attributes that have not been shown to be quantifiable - a more philosophical approach but well worth looking at. Velleman & Wilkinson (1993) and Bergman (1996) ) discuss the inadequacies of classifying variables as being on the nominal, ordinal, interval or ratio scale. Forrest & Andersen (1986) note the widespread use of inappropriate parametric analyses for ordinal variables in the medical research literature.
Knapp (1990) argues that the concept of meaningfulness should predominate, and calls on Townsend and Gaito and to sit down and talk things over, rather than writing disagreeable articles about each other. Townsend & Ashby (1984) and Gaito (1980) argue respectively (if not respectfully) for and against the view that parametric techniques are only appropriate for measurement variables. Atchley et al. (1976) and Atchley & Anderson (1978) highlight the adverse statistical consequences resulting from compounding variables into ratios. Stevens (1946) proposed the four scales (or levels) of measurement to describe the nature of information within the variable.
Wikipedia describes the four main levels of measurement, along with a brief account of the debate on the classification scheme and whether one should use the mean as a summary measure for an ordinal measure.