Draw boxplots and assess normality
Plot out data to get a visual assessment of the treatment and block effects, and assess how appropriate
(parametric) ANCOVA is for the set of data.
Plot relationship with covariate
Plot out data to get a visual assessment of the relationship between the response variable and the covariate,
and assess how appropriate (parametric) ANCOVA is for the set of data.
Whilst we might get away with arguing that there is a linear relationship for treatments 1 and
3, there is no evidence of any relationship at all for treatment 2. We will continue with the analysis, but
bear in mind that any 'adjustments' we make to means will be highly questionable (if not totally invalid!).
Get table of (unadjusted) means
Carry out analysis of covariance
Calculating sums of squares manually for an analysis of covariance can only be described as a pain in
the backside - and one needs to be very organized! Values for the various sums and sums of squares can
be obtained with R.
We first calculate
the sums of squares assuming a pooled regression of Y on X:
SSTotal
| =
| 582800 - 32002/18
| =
| 13911.11
|
SSTreatment |
= |
[9902 + 9952 + 12152]/6 −
32002/18
| =
| 5502.8
|
SSPooledReg
| =
| [632800 - (3530 × 3200)/18]2
| =
| 2856.8 |
|
701900 -(35302/18)
|
SSError
| =
| 13911.11− 2856.8 − 5502.8
| =
| 5552.31
| | |
Sums of squares Table |
| Σ sums |
Σ2 products |
ΣΣ2 totals |
x1 y1 x1y1 Σx1Σy1 x2 y2 x2y
2 Σx2Σy2 x3 y3 x3y
3 Σx3Σy3 X overall Y overall XY overall ΣXΣY
| 1190 990 1140 995 1200 12
15 3530 3200
| 238750 167600 199150 219900 166275 188900 243250 248925 244750 701900 582800 632800
| 1178100 1134300 1458000 11296000
| | |
Next calculate the regression statistics for each treatment separately:
SSTot1
| =
| 167600 - 9902/6
| =
| 4250
|
SSXY1
| =
| 199150 - (1190 × 990)/6
| =
| 2800
|
SSX1
| =
| 238750 -(11902/6)
| =
| 2733.33
|
SSReg1
| =
| 28002 / 2733.33
| =
| 2868.296
|
SSError1
| =
| 4250− 2868.296
| =
| 1381.704
|
b1 |
= |
2800/2733.33
| =
| 1.0244
| | |
SSTotal2
| =
| 166275 - 9952/6
| =
| 1270.833
|
SSXY2
| =
| 188900 - (1140 × 995)/6
| =
| − 150
|
SSX2
| =
| 219900 -(11402/6)
| =
| 3300
|
SSReg2
| =
| − 1502 / 3300
| =
| 6.818
|
SSError2
| =
| 1270.833−6.8182
| =
| 1264.015
|
b2 |
= |
− 150/3300
| =
| − 0.04545
| | |
SSTotal3
| =
| 248925 - 12152/6
| =
| 2887.5
|
SSXY3
| =
| 244750 - (1200 × 1215)/6
| =
| 1750
|
SSX3
| =
| 243250 -(12002/6)
| =
| 3250
|
SSReg3
| =
| 17502 / 3250
| =
| 942.308
|
SSError3
| =
| 2887.5−942.308
| =
| 19451.192
|
b3 |
= |
1750/3250
| =
| 0.5385
| | |
Summed regression statistics:
SSTotal
| =
| 4250 + 1270.833 + 2887.5
| =
| 8408.333
|
SSXY
| =
| 2800 - 150 + 1750
| =
| 4400
|
SSX
| =
| 2733.33 + 3300 + 3250
| =
| 9283.3
|
SSreg
| =
| 2868.296 + 6.818 + 942.308
| =
| 3817.422
|
SSError
| =
| 1381.704 + 1264.015 + 1945.192
| =
| 4590.911
|
bcommon |
= |
4400/9283.3
| =
| 0.4740
| | |
SSReg (common slope) = 44002/9283.3 = 2085.465
SSHeterogeneity of slopes = 3817.422 - 2085.465 = 1731.957. I
Model with interaction (maximal model, different slopes)
ANOVA table
|
Source of variation
| df
| SS
| MS
| F- ratio
| P
|
Diet
| 2
| 5502.8
|
|
|
|
Pre
| 1
| 2085.5
|
|
|
|
Heterogeneity of slopes
| 2
| 1732.0
| 866.0
| 2.2635
| 0.1465
|
Residual
| 12
| 4590.9
| 382.6
|
|
|
Total
| 17
| 13911.11
|
|
|
| | |
Interpretation here is somewhat problematical. At the P = 0.05 level the interaction is not
significant - in other words we can accept the lines as parallel. However, this in direct contradiction to our
visual assessment of the relationship between the response variable and the covariate, where we lack
evidence of any regression relationship at all between 'post' and 'pre' for treatment B - let alone a similar
slope to that found for treatments A and C. The apparent contradiction may
result from the lack of power of tests for interaction. Or it may result from
random variation in treatment B. Which is the case is a matter of judgement (or guesswork!), but for the
sake of the example we will assume it results from random variation.
When we come to doing it in R, we run into the same 'challenge' that we did when analyzing
unbalanced factorial designs.
Because the levels of the covariate are different for each of the three
treatments, we no longer have a balanced or orthogonal design. For a non-orthogonal design, it matters which variable is entered first. To be more precise, it affects sums of
squares and significance levels of the main effects and all but the highest level interaction effect. It does
not, however, affect the SS residual nor the parameter estimates and standard errors. If you want the
results identical to the manual version, then you have to enter the (nominal) treatment factor first followed
by the (continuous) covariate.
The alternative is to use Type II sums of squares which are calculated according to the principle of
marginality. Adjustments to the sums of squares are made for the other main effects in the statistical model
but not for higher-level interaction effects.
Irrespective of these considerations, we still have to decide what to do about the interaction. Given
that it is not significant at P=0.05, the conventional approach would be to drop the interaction and
assume parallel lines. We do this below and estimate adjusted means.
Model without interaction (parallel lines)
The diet factor remains significant (P = 0.0125) but 'pre' is getting very borderline with a P- value of 0.0496. If we do a type II ANOVA using the R car package (so order has no effect) we get exactly the same P value for 'pre' but a slightly higher value for the diet factor (P = 0.0200) (we leave you to check this).
Adjusted means
The adjusted means (
') are given by :
'1 = 165 − 0.474 (198.333 − 196.111) = 163.947
'2 = 165.833 − 0.474 (190 − 196.111) = 168.730
'3 = 202.5 − 0.474 (200 − 196.111) = 200.657
These then are the adjusted means post ANCOVA. Post hoc comparison of means reveals the level with diet a3 is significantly higher than with either of the other two diets.
So in conclusion was it worthwhile (or even valid) to do a covariance analysis. In this case we would say no - simply because the authors failed to demonstrate a clear relationship between pre and post values which is a pre-condition for using covariance analaysis. In the event it had little effect on the outcome - other than giving oneself a great deal more work!