Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)



Errors-in-variable regression: Use & misuse

(measurement error, equation error, method of moments, orthogonal regression, major axis regression, allometry)

Statistics courses, especially for biologists, assume formulae = understanding and teach how to do  statistics, but largely ignore what those procedures assume,  and how their results mislead when those assumptions are unreasonable. The resulting misuse is, shall we say, predictable...

Use and Misuse

In traditional asymmetric regression,  the value of Y is assumed to depend on the value of X and the scientist is interested in the value of Y for a given value of X. Ordinary least squares regression assumes that X (the independent variable) is measured without error, and that all error is on the Y variable. There are two sources of errors - measurement error (d) and intrinsic or equation error (e). These error terms are usually assumed to be random with a mean of zero (in other words no bias). By definition all equation error in asymmetric regression is assumed to be on the Y-variable since one is interested in the value of Y for a given value of X. But there may be substantial measurement error on the X variable. This does not matter if values of X are fixed by the experimenter, as is commonly the case in an experiment - in this situation the estimate of the slope is still unbiased. But if values of X are random and X is measured with error, then the estimate of the slope of the regression relationship is attenuated or closer to zero than it should be. One type of errors-in variables regression (the method of moments) enables one to correct the slope of an asymmetric regression for measurement error of the X-variable. The other type of regression is symmetrical or orthogonal regression. Here there is no question of a dependent or independent variable (hence sometimes Y1 and Y2 are used to denote the variables, rather than X and Y). We simply want to model the relationship between two (random) variables, each of which may be subject to both measurement and equation error.

Errors-in-variables regression is used much less than ordinary least squares regression apart from in certain specialized areas such as comparison of methods studies and allometry/isometry assessments. Its lack of use may seem surprising given how often the X-variable is not measured without error. Nevertheless, in situations when one is using regression purely for descriptive purposes or for prediction, ordinary least squares regression is still the best option, albeit one accepts that the slope is probably attenuated. We give  two examples of where ordinary least squares (OLS) linear regression could have been used rather than the more complex errors-in-variable regression.

The wrong type of errors-in-variables regression is often used when dealing with an asymmetric relationship - in other words where there is a clear independent and dependent variable. In this situation, orthogonal regression (including major axis and reduced major axis) is inappropriate. Instead, if there is substantial measurement error on the X-axis, the slope of the OLS regression should be corrected for attenuation using the method of moments. It is hard to find examples of the use of this method in the literature, but we do give several examples (such as relating survival time to HIV load, and relating phytophage species richness to tree abundance) where it should have been used rather than orthogonal regression.

We give four examples of where orthogonal regression (and its variants) are used in comparison of methods studies - for example a comparison of several techniques for estimation of cardiac output, and a comparison of two methods for making egg counts of nematodes. However, all these examples of symmetrical regression make simplifying assumptions about the errors by using major axis regression or reduced major axis regression. Providing measurement error really was the only factor causing points to deviate from a 1:1 relationship, a better approach would have been to assess measurement error for each method by repeatedly measuring the same sampling unit. This is especially the case when one method is known to produce more variable results than the other. There is a strong case for using the Bland-Altman method instead or in addition to errors-in-variables regression in such studies, and we give one example (comparing the results of a portable clinical analyzer with those obtained using a traditional analyzer) where this is done.

When testing for allometry (our example looks at the shape of a species of spider) both equation error and measurement error are present on both axes - and most authors again 'opt out' by using reduced major axis regression rather than attempting to estimate the different error terms. It would be better to guesstimate upper and lower limits of likely error on each variable, and then estimate the range of slopes that might be possible. We give an example of the use of errors-in-variables regression to obtain mass/length residuals, which are then used as measures of body condition. This is a controversial issue - should one for example use analysis of covariance instead - and we consider both sides of the argument. Lastly we look at a study which uses errors-in-variables regression to test Taylor's power law of the relationship between log variance and log mean.

Many of the issues considered previously for ordinary least squares re-emerge in these example. Bivariate observations must be independent and not (for example) obtained on the same units in a time series as is done in a trapping study. Relationships must be linear - a questionable assumption in some cases. Quite often the spread of values of the Y-variables appears to increase with the X values indicating heteroscedasticity. We also note that if variables are log transformed, the estimation of Y is biased after detransformation, and must be corrected appropriately.


What the statisticians say

Fuller (1987) is the classic work on errors-in-variables regression. Carroll et al. (2006) gives a more recent review of measurement error and regression. Sokal & Rohlf (1995) introduce error-in-variables regression under the (unhelpful) name of model II regression; major axis and reduced major axis regression are covered, but not the method of moments nor orthogonal regression. In contrast, Snedecor & Cochran (1989) only give the method of moments for dealing with measurement error on the X-axis.

Kermack & Haldane (1950) , Clarke (1980) and Ricker (1984) popularized some of the commoner forms of error-in-variables regression. However, Carroll & Ruppert (1996) cautioned against misuse of orthogonal regression by only taking measurement error into account - failure to account for equation errors was leading to overcorrection for measurement error. Mcardle (1988) , (2004) points the way forward for errors-in-variables regression in ecology, whilst Smith (2009) does the same for anthropology. Wharton et al. (2005) provides a detailed review of the use of major axis and reduced major axis regression in studies of allometry (as well as providing the R-package 'smatr').

Orthogonal regression was introduced to pharmacologists and clinical chemists by Cornbleet & Gochman (1979) under the somewhat arcane name of Deming regression. See also Westgard (1998) , Stockl et al (1998) and Martin (2000). Reduced major axis regression has been promoted by Ludbrook (1997) , (2002) , (2008) for comparison of methods studies, but rejected by Hopkins (2004) See also Batterham (2004). Physicists also use errors-in-variables regression - see Leng et al. (2007) for a lucid summary of its application in aerosol science.

Wikipedia covers errors-in-variables regression under the headings total least squares and errors-in-variables models. Gillard & Iles (2005) give a useful history of the maximum likelihood and method of moments techniques for errors-in-variables regression. GraphPad give a brief guide to how to do Deming regression along with a check list of assumptions.