In traditional asymmetric regression the value of Y is assumed to depend on the value of X and the scientist is interested in the value of Y for a given value of X. Ordinary least squares regression assumes that X (the independent variable) is measured without error, and that all error is on the Y variable. There are two sources of errors - measurement error (δ) and intrinsic or equation error (ε). Measurement error results from recording values that are different from the actual values. If there is measurement error and one plots Y against X, the observed values cannot therefore lie on a straight line, even if the actual values do. Of course in most cases, even if one could measure the actual values precisely, it is most unlikely that they would all lie on the line. This is because of equation error (traditionally just called error) which results from intrinsic variation between the sample units. These error terms are usually assumed to be random with a mean of zero (in other words no bias).
By definition all equation error in asymmetric regression is assumed to be on the Y-variable since one is interested in the value of Y for a given value of X. But there may be substantial measurement error on the X variable. This does not matter if values of X are fixed by the experimenter, as is commonly the case in an experiment - in this situation the estimate of the slope is still unbiased. But if values of X are random and X is measured with error, then the estimate of the slope of the regression relationship is attenuated or closer to zero than it should be. One type of errors-in variables regression (the method of moments) enables one to correct the slope of an asymmetric regression for measurement error of the X-variable.
The other type of regression is symmetrical regression. Here there is no question of a dependent or independent variable (hence sometimes Y1 and Y2 are used to denote the variables, rather than X and Y). We simply want to model the relationship between two (random) variables, each of which may be subject to both measurement and equation error. There are three common situations where symmetrical regression is used. Firstly symmetrical regression can be used to test if two methods of measurement agree. Measurements by one method are plotted against measurements by the other method. If fixed bias is present, one method will give values that are higher (or lower) than those from the other by a constant amount, but the slope will not differ significantly from 1. The level of bias is given by the value of the intercept. If proportional bias is present, the slope of the relationship will differ significantly from one. The second situation is in studies of allometry - in other words how size variables of organisms are related, typically as a linear relationship on logarithmic scales. The third situation is when testing data to see if it conforms to certain 'laws' or theoretical relationships - for example Taylor's Power Law relating log variance to log mean.
For comparison of methods studies, it is usually assumed (but not necessarily the case) that there is no equation error as both methods are measuring the same thing. For allometric relationships and testing whether data conform to certain laws there is likely to be equation and measurement error on both axes. For symmetric regression the usual approach is to use orthogonal regression or one of its variants - either major axis regression or reduced major axis regression.
Method of moments
This method is appropriate for asymmetrical regression of Y on X where X is subject to error. The regression should initially be carried out using ordinary least squares, and the slope estimated (and significance assessed) in the usual way. In some situations the measurement error can then be estimated by taking duplicate measurements on the same items to obtain the error (within-subject) variances :
Algebraically speaking -
|Error (within subject) variance =
- di2 are the squared differences between measurements for each individual
- n is the number of subjects, or items, in your set.
The method of moments estimator is then used to correct the OLS slope for attenuation:
Algebraically speaking -
|vx − vδx
- bmm is the method-of-moments estimate of the slope,
- bols is the ordinary least squares estimate of the slope,
- vx is the variance of X, and
- vδx is the error (within subject) variance.
The formulae for standard errors and confidence intervals are given in Fuller (1987)
Unfortunately it is sometimes not possible to quantify the measurement error variance. This applies if the act of measurement is destructive to the unit being measured (for example the perfusion technique to estimate the number of adult Schistosoma worms in blood veins) or if the process of measurement is long and complex (for example estimating population size by mark-release-recapture.) More commonly researchers just do not think about estimating measurement error until the study is long finished! Mcardle (2004) suggests expressing a worse case guess of measurement error variance (vδx) as a proportion (p) of vx, and then calculating p/ (1 − p). This is the worst-case amount of attenuation as a proportion of the OLS estimate. If this amount of attenuation does not affect your conclusions, then you can argue that it is not necessary to correct for attenuation. Note that in small samples, the sampling distribution of the corrected slope is highly skewed. In such cases a modified version of the method-of-moments estimator is recommended (Fuller (1987)).
There are several variants of this approach, known under a plethora of different names. All assume that there may be equation error and measurement error on each variable. The ratio of the (combined) error variances is used to estimate the parameters. Note, however, that the significance of the regression is (usually) tested in exactly the same way as ordinary least squares regression - either using ANOVA (testing F= MSregression/MSerror) or a t-test (t = (b − β) / SEb).
When collecting data with a view to using a symmetric regression, it is important to note that subjects should be randomly selected and not chosen conditionally on the values of the X or Y variable. In regression, samples are commonly selected systematically to represent a large range of X values. However, when fitting a symmetrical regression, both X and Y variables are treated as random and so need to be sampled likewise. If the X variable were sampled to represent a large range of X values (high variance), this would bias the slope towards zero.
Orthogonal regression (= Deming regression, total least squares)
This method minimizes the perpendicular distance of points from the line weighted by the ratio of errors (λ). Hence λ must be estimated independently. Whilst measurement error can often be estimated fairly readily (as detailed above for the method of moments), this is not practicable for equation error. Hence if you only work out the ratio based on the measurement error for each variable, you are implicitly assuming that there is no equation error, or that it is the same on each variable. This might be valid in comparison of methods studies where the same measurement is being made using different methods. But it will emphatically not be valid for other types of symmetrical regression. It is often wrongly used in such situations where it will overcompensate for measurement error.
If equation error is likely to be present, λ cannot be determined precisely. One solution to this is to adopt McArdle's (2004) approach and guesstimate the extremes that λ may take. The true slope can then be said to lie between the outer confidence limits of the two possible regression lines. The problems with orthogonal regression are probably why most scientists instead use either major axis regression or reduced major axis regression. These are much easier to use, but do make (sometimes wildly) improbable assumptions.
Major axis regression (= orthogonal (distance) regression)
This approach assumes that although both variables are subject to measurement & equation error, the total error on Y is equal to the total error on X (in other words var(δX) + var(εX) = var (δY + var(εY)). Hence λ = 1. The two variables must obviously be measured in the same units for this to stand some chance of being true. The method minimizes the sum of the squared perpendicular distances of points to the line.
Reduced major axis regression (= geometric mean regression, least products regression, standardized (or standard) major axis regression)
This is the most heavily used symmetrical regression method as it is relatively robust to assumptions not being met, and measurements do not have to be in same units. It is assumed the total error variances are proportional to observed error variances, so λ in the maximum likelihood equation can be obtained from vx/vy. Whilst this is unlikely to be generally true, it is more probable if both variables are counts or log transformed data. The method minimizes the sum of the products of the vertical and horizontal deviations of the points from the line.
Algebraically speaking -
- brma is the estimate of the slope for reduced major axis regression,
- sx and sy are the standard deviations of X and Y respectively,
- sign (covxy) is the sign (+ or −) of the covariance of X and Y.
The 100 (1 − α)% confidence interval (CI) for brma is then given by
||brma[ √(Β + 1) ± √Β]
- B =F [vxvy − covxy2]/ N − 2 where
F is a quantile from the F-distribution for the chosen type I error rate (α) and 1 and N − 2 degrees of freedom, and
vx and vy are the variance of X and Y.
The slope of the RMA regression line will always be greater than that of the OLS regression line. To test whether βrma differs from 1 (in allometry studies b=1 indicates isometry, ) one tests whether Y − X is uncorrelated to Y + X. In other words, if the data were rotated by 45º, would the subsequent values be uncorrelated?
Note because the RMA slope is calculated as the ratio of the two standard deviations, it can only be zero if the covariance is zero. Hence, unlike with OLS regression, you cannot test for association between Y and X using the null hypothesis β = 0.
- The relationship between Y and X is linear. A transformation of one or both variables may improve linearity of the response.
- The errors are independent. It is assumed that successive residuals are not correlated over time (serial correlation).
- The variance of the errors is constant (homoscedasticity) (a) versus time and (b) versus the predictions (or versus the independent variable).
- Errors are normally distributed.
- Both X and Y may be measured with error.
For asymmetric regression where X is the independent (explanatory) variable and Y is the dependent (response) variable, X is assumed to have measurement error and Y to have both measurement and equation error. The method of moments is used to correct the ordinary least squares estimate of the slope.
For symmetric regression where X and Y are two random variables, there may be measurement and equation error on both axes. In orthogonal regression it is (effectively) assumed that there is no equation error, and that the ratio of error is obtained by estimating measurement error on each axis. In major axis regression it is assumed that the total errors on each axis are identical, so the ratio of errors is equal to one. In major axis regression it is assumed that the ratio of errors can be estimated from the ratio of the observed standard deviations.