 
Approach to multiple comparisons
Multiple comparisons remains a controversial issue. It is complicated by there being two main situations when adjustments are commonly made. These are (a) comparison of means of multiple treatment levels following an analysis of variance (what we are concerned with here); (b) use of multiple outcome measures in an experiment or observational study (for example, some clinical trials may have only one primary response variable but up to 20 secondary response variables; an ecological observational study may test the same hypothesis on many different species.) As we point out in the references, many (especially medical statisticians) are (sometimes vehemently) opposed to any adjustments to control the perexperiment error rate. Others accept adjustments for postANOVA comparisons, but not for multiple outcome measures. Others argue for rigorous adjustments to maintain a 5% Type I error rate.
If we focus on the postANOVA situation, several points come out strongly.
 It is always better to design a study with the aim of testing specific (alternate) hypotheses  rather than decide what you wish to test after having done the study. With a welldesigned study, you can preplan a limited number of independent (or orthogonal) contrasts to test your hypotheses. An important advantage of this approach is that you do not need to make any adjustments to control perexperiment error when comparing such means. Hence the power of such tests is strengthened.
 Do not restrict yourself to pairwise comparisons. Very often combined mean comparisons can be much more interesting (for example, comparing response to a control with the mean of responses to two different, but related treatments). This principle is taken much further with factorial designs, but can often be applied after a oneway ANOVA.
 There is nothing wrong with making (a few) unplanned comparisons based on anything unexpected that emerges in the results. That is after all what research is all about! But in that situation you should use a test which provides adequate protection against type I error  and you should be able to defend your particular choice of test.
A. Planned orthogonal comparisons
Partitioning treatment sums of squares
This method is recommended for carrying out a set of linear contrasts (both pairwise and combined mean comparisons) which are orthogonal (statistically independent). If the study has been well designed with clear hypotheses to test, this set of orthogonal contrasts may well include all the comparisons of interest.
The significance of each of these linear contrasts is assessed by partitioning the treatment sums of squares obtained in a standard analysis of variance. In order to do this you need to determine the contrast coefficients for each of the two contrasts. These are the coefficients in the linear equation describing a contrast (C), namely:
C = c_{1}_{1} + c_{2}_{2} + c_{2}_{3}
Values of coefficients are determined thus:
 If there are the same number of sample means in each comparison group, assign coefficients of +1 to the members of one group and 1 to the other group.
 If there are a different number of sample means in each group, assign to the first group coefficients equal to the number of means in the second group and to the second group coefficients of the opposite sign equal to the number of means in the first group.
 Reduce coefficients to the smallest possible integers by dividing through, and then scale them so that the sum of positive coefficients equal one, and the negative coefficients equal minus one  so the overall sum of coefficients is zero.
We can now rewrite the orthogonal set of contrasts above with their coefficients:
Note that some authorities prefer to express the coefficients as integers. A little practice should enable you to readily draw up an orthogonal set  but you should always carry out a formal check on whether the set is orthogonal. The procedure for doing this is detailed in the related topic on checking orthogonality.
It is then simply a matter of partitioning the treatment sums of squares. The sums of squares for each contrast are given by:
Algebraically speaking 
SS_{Ci
}  =
 n(Σc_{i}_{i)2
} 

Σc_{i}^{2
}   
where:
 C_{i} are the contrasts,
 c_{i} are the contrast coefficients,
 _{i} are the treatment means,
 n is the number of replicates.

For an orthogonal set of contrasts, each contrast will have one degree of freedom. Hence the sums of squares are equal to the mean squares for each contrast. The contrast mean squares are then each divided by the mean square error to obtain Fratios, which are tested in the usual way.
Fisher's protected least significant difference (LSD)
This is the most widely (mis) used multiple comparison test. The protection referred to derives from the test only being used after finding a significant treatment effect in an ANOVA. However, this does not control the perexperiment error rate, and so should the test only be used under very specific circumstances  namely for comparisons that are both preplanned and orthogonal. Some authorities also insist it should only be used when there are less than four treatment means  but since, under the conditions specified above, the test is exactly equivalent to partitioning treatment sums of squares, it would seem unnecessary to add this requirement. The least significant difference is calculated as below:
Algebraically speaking 
Least significant difference (LSD)
 =
 t_{α(df)
}  √
 
2 MS_{error
} 

n
  
where
 t_{α} is a quantile from the tdistribution for the chosen type I error rate (α) and the same number of degrees of freedom as MS_{error},
 MS_{error} is the mean square error from the ANOVA table (for oneway ANOVA df = k(N1), where k is the number of treatments, and N is the total number of observations),
 n is the sample size, assuming the same number of replicates in each group.

Any of the preplanned contrasts greater than that difference is accepted as significant at the chosen level of α.
Bonferroni & DunnSidak
These methods protect against type I errors by controlling the perexperiment error rate. Strictly speaking they are only appropriate for planned orthogonal contrasts. But unlike the LSD test, they are often recommended for multiple (>3) planned orthogonal comparisons. They are also often recommended for planned but nonorthogonal contrasts, but in this latter situation they are conservative. They should not be used for all possible pairwise comparisons.
The Bonferroni correction can be applied using a modified least significant difference, namely:
Algebraically speaking 
Bonferroni corrected LSD
 =
 t_{b
}  √
 
2 MS_{error}


n
  
where
 t_{b} is a quantile from the tdistribution at the adjusted α level (b). b is equal to α/r where r is the number of comparisons. It has the same number of degrees of freedom as MS_{error},
 MS_{error} is the mean square error from the ANOVA table (for oneway ANOVA df = k(N1), where k is the number of treatments, and N is the total number of observations),
 n is the sample size, assuming the same number of replicates in each group.

The DunnSidak correction can be applied in the same way, namely:
Algebraically speaking 
DunnSidak corrected LSD
 =
 t_{d} (df)
 √
 
2 MS_{error}


n
  
where
 t is a quantile from the tdistribution at the adjusted α level (d). d is equal to 1 − (1−α)^{1/r} where r is the number of comparisons. It has the same number of degrees of freedom as MS_{error},
 MS_{error} is the mean square error from the ANOVA table (for oneway ANOVA df = k(N1), where k is the number of treatments, and N is the total number of observations),
 n is the sample size, assuming the same number of replicates in each group.

B. Planned nonorthogonal comparisons
Comparing treatments with a control  Dunnett's test
Dunnett's test can be applied in the same way as the tests above, but using critical values tabulated by Dunnett in place of quantiles from the tdistribution, namely:
Algebraically speaking 
Dunnett's LSD
 =
 d_{α (df)
}  √
 
2 MS_{error}


n
  
where
 d_{α} is the critical value for Dunnett's test for the chosen type I error rate (α). Critical values can be obtained on the web. It has the same number of degrees of freedom as MS_{error},
 MS_{error} is the mean square error from the ANOVA table (for oneway ANOVA df = k(N1), where k is the number of treatments, and N is the total number of observations),
 n is the sample size.

C. Unplanned pairwise comparisons
Tukey's Honestly Significant Difference
Tukey's test is a simultaneous inference method. If sample sizes are equal, it uses one range value to calculate the same shortest significant range for all comparisons. It is the most widely used method to make all possible pairwise comparisons amongst a group of means. In its original form, sample sizes were assumed to be equal. Kramer modified the method so it could be used for unequal group sample sizes, using the harmonic mean of the sample sizes of the groups being compared. The first formulation below is for equal sample sizes, whilst the second is for unequal group sizes:
Algebraically speaking 
Tukey HSD
 =
 Q_{α(k,df)
}  √
 
MS_{error
} 

n

TukeyKramer HSD
 =
 Q_{α(k,df)
}  √
 
MS_{error
}  (
 1
 +
 1
 )

 
n_{A
}  n_{B
}   
where
 Q_{α(k,df)} is the value of the studentized range statistic for the total number of treatments (k) at the chosen type I error rate (α). Values of Q can be obtained using R or on the web. Degrees of freedom are the same as for MS_{error},
 MS_{error} is the mean square error from the ANOVA table (for oneway ANOVA df = k(N1), where k is the number of treatments, and N is the total number of observations),
 n is the sample size.

Tukey's HSD is well accepted in the literature, and its use is recommended. It is, however, conservative and one of the multiple stage tests may be preferred if the desire is to maximize power. Several other methods are available for uneven numbers of replicates including Spjøtvoll & Stollines T' method and Hochberg's GT2 method. However, both of these tend to be more excessively conservative compared to the TukeyKramer method. Full details can be found in Sokal & Rohlf (1995) if required.
StudentNewmanKeuls Test
This is described variously as a stepwise or multiplestage test. The range statistic varies for each pairwise comparison as a function of the number of group means in between the two being compared. A different shortest significant range is computed for each pairwise comparison of means.
Means are first ordered by rank, and the largest and smallest means are tested. If there is no significant differences, testing stops there and it is concluded that none is significantly different. Then means of the next greatest difference are tested using a different shortest significant range. Testing is continued until no further
significant differences are found.
Algebraically speaking 
SSR
 =
 Q_{α(m,df)
}  √
 
MS_{error
} 

n
  
where
 Q_{α(k,df)} is the value of the studentized range statistic for the number of means covered in the particular comparison (m_ at the chosen type I error rate (α). Values of Q can be obtained using R or on the web. Degrees of freedom are the same as for MS_{error},
 MS_{error} is the mean square error from the ANOVA table (for oneway ANOVA df = k(N1), where k is the number of treatments, and N is the total number of observations),
 n is the sample size.

Such tests are valid only when group sample sizes are equal. If sample sizes were unequal, test results could be nonintuitive, for example A > B, B > C, but A is not significantly different from C. The StudentNewmanKeuls (SNK) test is more powerful than Tukey's method, so it will detect real differences more frequently.
However, in some situations the StudentNewmanKeuls test offers poor protection against a type I error. This is especially the case when treatment means fall into groups which are themselves widely spaced apart. Differences between means within groups will be significant more often than they should be at the specified level of α.
The StudentNewmanKeuls test is not as bad in this respect as another widely used test  Duncan's multiple range test. This is a modification of the StudentNewmanKeuls test that uses increasing αlevels to calculate critical values at each step of the above procedure. The test is implemented using tables prepared by Duncan which give the appropriate Q value for a given number of treatments (k). When k=2 the two procedures have identical values; for values of k larger than 2, the Duncan procedure has the smaller critical value.
This means that the Duncan test is more liberal in detecting differences, a point defended by Duncan on the basis that the global null hypothesis is often (nearly always?) false, and hence most statisticians tend to overprotect it against type I errors. However, few statisticians support him in this, mainly because the test fails to control the family wise error rate at the nominal αlevel. In addition, many journals will not accept it so the 'struggling' research scientist has little choice but to avoid the test.
Ryan's Q Test
Algebraically speaking 
Ryan's SSR
 =
 Q_{b(m,df)
}  √
 
MS_{error
} 

n
  
where
 Q_{b(m,df)} is the value of the studentized range statistic for the number of means spanned in the particular comparison (m) at the adjusted α level (b). b is equal to 1  (1α)^{m/k}. Values of Q for the adjusted (nonstandard) probabilities are most readily obtained using R. Degrees of freedom are the same as for MS_{error},
 MS_{error} is the mean square error from the ANOVA table (for oneway ANOVA df = k(N1), where k is the number of treatments, and N is the total number of observations),
 n is the sample size.

D. Unplanned pairwise and combined mean comparisons
Scheffé's method
Scheffé's method is a simultaneous inference method that can be applied to all possible contrasts among the means, not just the pairwise differences. Each contrast of interest is set up such that the sum of the coefficients is equal to zero, and then estimated as follows:
Algebraically speaking 
C_{i
}  =
 Σ c_{i}_{i
} 
where
 C_{i} is the ith contrast,
 c_{i} is the coefficient for the ith mean,_{i}.

The simultaneous 100(1α) confidence limits for a contrast are given by:
Algebraically speaking 
CL
 =
 C_{1} ± √
 
[(k  1)F_{α,k1, Nk}]
 s_{C
}   
where
 C_{1} is the specified contrast,
 k is the number of treatments or groups,
 F is a quantile from the Fdistribution,
 N is the total number of observations,
 s_{C} is the standard deviation of the contrast which is equal to
√
 
[
 MS_{error} Σ
 c_{i}^{2
}  ]


n_{i
}   

Note that an R function called pairwise.t.test computes all possible two group comparisons making adjustments for multiple comparisons if required.
e.g. pairwise.t.test (con, trt, p.adj="bonferroni"); default adjustment is Holms method
Assumptions
All MCTs discussed thus far have the same assumptions as does ANOVA  data within each
treatment group are normally distributed, and each treatment group has equal variance.
Violations of these assumptions will result in a loss of power to detect differences which are
actually present.
Related
topics :
Checking orthogonality
