 
Multiple comparison tests
after parametric ANOVA
Worked example 1
Our first worked example returns to the study by Johnston et al. (2001) on serum Vitamin E concentrations in German Shepherd (GS) dogs with and without a degenerative nerve disorder compared to other breeds of dogs without the disorder.
Table of means

Group
 Mean
 SE

GS with disease (A) GS without disease (B) Other breeds (C)
 3.8343 3.5913 3.5011
 0.0732 0.0732 0.0732

Original data are given in  for this worked example we have just taken the first twenty observations for each 'treatment'. This produces the table of means given here, and the ANOVA table given below:

Using
ANOVA table

Source of variation
 df
 SS
 MS
 F ratio
 P

Treatments
 2
 1.1877
 0.5938
 5.5391
 0.006<

Error
 57
 6.1109
 0.1072



Total
 70
 13.2356




Meaningful contrasts (we assume here they are preplanned) would be to first compare the two control (unaffected) groups (B & C) and then compare the affected group (A) with the mean of the two unaffected groups.
Contrast  Null hypothesis  Coefficients

C_{1
}  H_{0} : μ_{B}=μ_{C
}  0_{A} + 1_{B} − 1_{C
} 
C_{2
}  H_{0} : μ_{A} = (μ_{B}+μ_{C})/2
 1_{A} − 1/2_{B} − 1/2_{C
} 
Sums of squares for each of the two contrasts are given by:
SS_{C1
}  =
 20((0×3.83429)+(1×3.59132)+(1×3.50114))^{2
} 

(0)^{2}+(1)^{2}+(1)^{2
} 
 =
 0.08132

SS_{C2
}  =
 20((1×3.83429)+(0.5×3.59132)+(0.5×3.50114))^{2
} 

(1)^{2}+(0.5)^{2}+(0.5)^{2
} 
 =
 1.10638

Contrast  SS  MS  MS_{error}  Fratio  P

C_{1
}  0.08132  0.08132  0.1072  0.7586  0.387

C_{2}  1.10638  1.10638  0.1072  10.321  0.002

Using
From this we may conclude that dogs with the nerve disorder had a significantly higher concentration (P = 0.002) of serum Vitamin E than dogs without the disorder. We cannot however tell whether the higher concentration caused the disorder  or resulted from the disorder  or was related to some third factor. With an observational design the strength of inference is weak.


Worked example 2
Mean number of eggs laid per leaf

Var A Lemon Mean=3.7 (n=5)
 3.8  2.8  4.4  3.4  4.1

Var B Lemon Mean=2.9 (n=5)
 2.8  2.7  3.1  3.6  2.3

Var C Orange Mean=1.88 (n=5)
 1.7  1.3  2.7  1.8  1.9

Var D Orange Mean=2.22 (n=5)
 2.1  1.7  3.1  2.4  1.8

Var E Grapefuit Mean=0.56 (n=5)
 0.5  0.1  0.6  0.4  1.2

Var F Grapefruit Mean=0.92 (n=5)
 1.6  1.1  0.8  0.9  0.2

We use hypothetical data for our second worked example, but it is based on the study by Vercher et al. (2008) on the influence of citrus species on the number of eggs laid by the citrus leafminer. Treatments comprise six different citrus varieties, two each of lime, orange and grapefruit.
A single replicate consists of the mean number of eggs per leaf laid by five adult female moths in a cage with a young shoot after 24 h exposure. Data are shown in the table:

Check assumptions for ANOVA
Box plots are examined to assess how appropriate (parametric) ANOVA is for the set of data.
{Fig. 1}
There is no evidence of nonnormality or heteroscadicity from the plots, but we check homogeneity of variances using Bartlett's test.
Using
This gives a Pvalue of 0.9793 so again there is no evidence that variances are not homogeneous.
Perform analysis of variance
Analysis of variance is carried out, which indicates a highly significant treatment effect (F_{5,24} = 25.85, P < 0.001). The mean square error is 0.271 with 24df:
Using
Assess effect sizes
Although orthogonal contrasts should always be the method of choice, there are times when it is inappropriate (usually when an experiment has been designed without thinking in advance what alternative hypotheses one wishes to test). In this example, orthogonal contrasts have little to offer, so we will instead make all pairwise comparisons  which is what Vercher et al. (2008) did in their analysis using Tukey's honestly significant difference test.
 Compute the honestly significant difference
Tukey HSD
 =
 4.3727
 √
 
0.271


5

 =
 1.018

Using
Rank the means and tabulate their differences. Those marked below with * are significant (P < 0.05). Precise Pvalues for each difference are given in the R output.
Ranked means

 3.70(A)  2.90(B)  2.22(D)  1.88(C)  0.92(F)  0.56(E)

2.90(B)  0.8          

2.22(D)  1.48*  0.68        

1.88(C)  1.82*  1.02*  0.34      

0.92(F)  2.78*  1.98*  1.30*  0.96    

0.56(E)  3.14*  2.34*  1.66*  1.32*  0.36  

Use horizontal lines to underline treatments that do not differ.
Rank  1  2  3  4  5  6

Treatment  (A)  (B)  (D)  (C)  (F)  (E)

 

 

 

 

 

 Conclusions: There is a general tendency for lemon varieties to have more eggs than orange varieties, and for orange varieties to have more than grapefruit varieties  but in each case there is no significant difference between the 'best' (largest number of eggs) variety of one species and the 'worst' of the species with the next largest number of eggs.
We may decide that Tukey's test is too conservative, especially since other researcher's have shown that (for example) lemons tend to have more eggs than other citrus species. Hence we feel we can justify use of a (more liberal) test, and opt for Ryan's Q test.
 Compute corrected α levels (b) for each of the five values of m (number of means spanned).
For m=6, b=1(10.05)^{6/6}=0.0500 For m=3, b=1(10.05)^{3/6}=0.0253
For m=5, b=1(10.05)^{5/6}=0.0417 For m=2, b=1(10.05)^{2/6}=0.0169
For m=4, b=1(10.05)^{4/6}=0.0338
Compute Ryan's SSR for each of the five values of m (number of means spanned).
Ryan's SSR_{m=6<
}  =
 4.3727
 √
  = 1.018

0.271


5

Ryan's SSR_{m=5} = 4.2850 × 0.2328 = 0.9975
Ryan's SSR_{m=4} = 4.1563 × 0.2328 = 0.9676
Ryan's SSR_{m=3} = 3.9749 × 0.2328 = 0.9254
Ryan's SSR_{m=2} = 3.6309 × 0.2328 = 0.8453
Reference to the table of differences of means above shows that the only change is that varieties C and F are now significantly different with a difference of 0.96, which is greater than the critical difference of 0.8453. If we again use horizontal lines to underline treatments that do not differ:
Rank  1  2  3  4  5  6

Treatment  (A)  (B)  (D)  (C)  (F)  (E)

 

 

 

 

A third approach (and possibly the best) would be carry out three combined mean contrasts as detailed below (mean lemons vs mean oranges, mean lemons vs mean grapefruits, mean oranges vs mean grapefruits). Note these contrasts are not orthogonal so we must use Scheffé's method to test them:
 Specify the contrasts of interest
Contrast  Null hypothesis  Coefficients

C_{1
}  H_{0} : (μ_{A} + μ_{B})/2=(μ_{C} + μ_{D})/2
 + 1/2_{A} + 1/2_{B} − 1/2_{C} − 1/2_{D
} 
C_{2
}  H_{0} : (μ_{A} + μ_{B})/2=(μ_{E} + μ_{F})/2
 + 1/2_{A} + 1/2_{B} − 1/2_{E} − 1/2_{F
} 
C_{3
}  H_{0} : (μ_{C} + μ_{D})/2=(μ_{E} + μ_{F})/2
 + 1/2_{C} + 1/2_{D} − 1/2_{E} − 1/2_{F
} 
Estimate each of the contrasts
C_{1} = 1.85+1.45−0.94−2.22 = 1.25
C_{2} = 1.85+1.45−0.28−0.46 = 2.56
C_{3} = 0.94+1.11−0.28−0.46 = 1.31
Calculate 95% confidence limits for each of the contrasts
CL_{1}
 =
 1.25 ± √
  × √
 
[5 × 2.62]
 0.271 × [.05+.05+.05+.05]

 =
 1.25 ± 0.8426 (0.4074  2.0926)

CL_{2
}  =
 2.56 ± 0.8426 (1.7174  3.4026)

CL_{1
}  =
 1.31 ± 0.8426 (0.4674  2.1526)

Conclusions: In all three cases the limits exclude zero, so the contrasts are all significant. We may conclude that lemons (of the two selected varieties) have significantly more eggs laid on them than oranges, which in turn have significantly more than grapefruit.


