InfluentialPoints.com
Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

 

 

Errors-in-variables regression

Worked example 1

Data obtained at perfusion
Calf Adult
worm
count
Tissue
egg count
1
2
3
4
5
6
7
65
81
81
23
11
34
42
421600
594600
484400
  49200
  28800
282600
194200
Our first worked example uses data from De Bont et al. (2002) who used natural infections of cattle to assess the relationships between egg counts and adult female worm count. Both variables were log transformed so that the authors could test whether egg production was density dependent by assessing whether the slope differed significantly from one. This improved linearity is some, but not all, of the trials.

This is an asymmetric regression with adult female worm count as the independent (explanatory) variable and faecal egg count as the dependent (response) variable. Since we are interested in the value of the slope parameter, we need to take account of the fact that the X-variable is likely to be measured with substantial error given the errors associated with the perfusion technique for recovering schistosomes. Hence we plan to use ordinary least squares regression, and then adjust the slope using the method of moments.

Test for linearity

The effect of the log transformation can be seen in the figure below:

msymr1.gif

We accept the relationship is more or less linear (with or without the log transformation on both axes), and proceed with the analysis.

Ordinary least squares regression

The coefficients of the OLS regression are obtained first:
b    =   0.1563    =  1.5383
0.1016
a    =   5.2822 − 1.5383 1.5982    =   2.8237

Using

R gave the same values with a P-value of 0.0007. Diagnostics plots are shown below.

msymr2.gif

The fit is reasonably good. The plot of residuals versus the predictor variable does not provide any evidence of heteroscedacticity. The normal QQ plot of residuals is less encouraging, but fortunately regression is fairly robust to non-normality.

Correction to slope by method of moments

We now have to assess the level of measurement error in measuring adult worm numbers. Unfortunately this is very difficult (if not impossible) to measure for the perfusion technique because you cannot take duplicate readings (the sampling technique is destructive). Nor can you realistically set up a model system where you have known numbers of adult worms present. Hence the only option is to make a worse case guess of measurement error variance as a proportion of the variance of x. In this case the variance of log (worm numbers) is 0.1016 and our estimate of measurement error variance was 0.01. Hence the worst-case amount of attenuation as a proportion of the OLS estimate would be 0.109, which is around 11%. Hence a better estimate of the slope would be around 1.71, rather than the previous estimate of 1.538. Since the estimate is based on a guess of measurement error variance, there is no point in attaching a confidence interval. But one can certainly conclude that the data provide no evidence of density dependence - for which the slope would be significantly less than one.

Worked example 2

Data obtained at perfusion
Calf Eggs
per gram
of faeces
Tissue egg
count
1
2
3
4
5
6
7
26.0
41.5
47.8
11.5
  6.4
19.5
51.0
421600
594600
484400
  49200
  28800
282600
194200
Our second worked example also uses data from De Bont et al. (2002) but in this case the authors looked at the relationships between faecal and tissue egg counts. The rationale for doing this was not made clear, but since both measures are used as indicators of the intensity of infection, it seems reasonable to expect them to be correlated. A slope of one would indicate a simple proportional relationship between the two measures.

This is clearly a symmetrical regression. There are no dependent or independent variables - both variables are measures of egg numbers. One could estimate measurement error for these two variables, but since there will also be equation error, one would not be able to use these in the maximum likelihood equation. Since both variables are log10 transformed, reduced major axis regression is probably the best option.

Test for linearity

The effect of the log10 transformation can be seen in the figure below:
msymr3.gif

The relationship is more or less linear after the log10 transformation, although there is one outlier. Hence we proceed with the analysis.

Reduced major axis regression

The coefficients of the reduced major axis regression are estimated thus:
b    =   + 0.3399    =  0.6632
0.5125
a    =   1.3681− 0.6632 5.282    =   −2.135

In R we initially ran the standard OLS regression using the linear model. This gave a P-value of 0.01341 with a slope of 0.569. We then ran the R package 'smatr', first for OLS regression as a check, and then for reduced major axis regression.

Using

Coefficients for reduced major axis regression along with their 95% confidence intervals are given below:

slope = 0.6632 (95% CI: 0.3788 - 1.1611)
intercept = -2.135 (95% CI : -4.2101 - 0.0601)

Note that, as expected, the reduced major axis slope is greater than that estimated by simple OLS regression. Diagnostics for the reduced major axis regression are shown below:

msymr4.gif

The fit is less good than in the previous worked example, but still acceptable. The plot of residuals versus the predictor variable does not provide any evidence of heteroscedacticity. Again the normal QQ plot of residuals suggests some non-normality.

We assess whether the slope of the relationship differs significantly from 1 by testing whether Y − X is uncorrelated to Y + X. Pearson's product moment correlation coefficient comes out to -0.6355 which has a (non-significant) P-value of 0.1250. We conclude that the slope of the reduced major axis regression does not differ significantly from 1. The R package 'smatr' provides a test of slope against any specified value - results are the same as given here.

Using