Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
Covariance Analysis (ANCOVA)
Worked example 1
Our first worked example uses data from Larry Douglass (2004) (University of Maryland) on the effect of three different diets on cholesterol levels. Pre treatment levels were used as the covariate. Douglass does the analysis with SAS - we use R.
Comparison is instructive!
Draw boxplots and assess normalityPlot out data to get a visual assessment of the treatment and block effects, and assess how appropriate (parametric) ANCOVA is for the set of data.
Plot relationship with covariatePlot out data to get a visual assessment of the relationship between the response variable and the covariate, and assess how appropriate (parametric) ANCOVA is for the set of data.
Whilst we might get away with arguing that there is a linear relationship for treatments 1 and 3, there is no evidence of any relationship at all for treatment 2. We will continue with the analysis, but bear in mind that any 'adjustments' we make to means will be highly questionable (if not totally invalid!).
Get table of (unadjusted) means
Carry out analysis of covariance
Calculating sums of squares manually for an analysis of covariance can only be described as a pain in
the backside - and one needs to be very organized! Values for the various sums and sums of squares can
be obtained with
Next calculate the regression statistics for each treatment separately:
Summed regression statistics:
SSReg (common slope) = 44002/9283.3 = 2085.465
SSHeterogeneity of slopes = 3817.422 - 2085.465 = 1731.957. I
Model with interaction (maximal model, different slopes)
Interpretation here is somewhat problematical. At the P = 0.05 level the interaction is not significant - in other words we can accept the lines as parallel. However, this in direct contradiction to our visual assessment of the relationship between the response variable and the covariate, where we lack evidence of any regression relationship at all between 'post' and 'pre' for treatment B - let alone a similar slope to that found for treatments A and C. The apparent contradiction may result from the lack of power of tests for interaction. Or it may result from random variation in treatment B. Which is the case is a matter of judgement (or guesswork!), but for the sake of the example we will assume it results from random variation.
When we come to doing it in R, we run into the same 'challenge' that we did when analyzing
The alternative is to use Type II sums of squares which are calculated according to the principle of marginality. Adjustments to the sums of squares are made for the other main effects in the statistical model but not for higher-level interaction effects.
Irrespective of these considerations, we still have to decide what to do about the interaction. Given that it is not significant at P=0.05, the conventional approach would be to drop the interaction and assume parallel lines. We do this below and estimate adjusted means.
Model without interaction (parallel lines)
The diet factor remains significant (P = 0.0125) but 'pre' is getting very borderline with a P- value of 0.0496. If we do a type II ANOVA using the R car package (so order has no effect) we get exactly the same P value for 'pre' but a slightly higher value for the diet factor (P = 0.0200) (we leave you to check this).
The adjusted means ( ') are given by :
These then are the adjusted means post ANCOVA. Post hoc comparison of means reveals the level with diet a3 is significantly higher than with either of the other two diets.
So in conclusion was it worthwhile (or even valid) to do a covariance analysis. In this case we would say no - simply because the authors failed to demonstrate a clear relationship between pre and post values which is a pre-condition for using covariance analaysis. In the event it had little effect on the outcome - other than giving oneself a great deal more work!