Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)



Non-parametric multiple comparison tests

Providing distributions are of a similar shape, the Kruskal-Wallis K statistic enables one to test the general hypothesis that all population medians are equal. If this null hypothesis is rejected, the next step is to compare the individual groups. This is done using non-parametric multiple comparison tests. We can group the available methods in the same way as we did for parametric multiple comparison procedures.


Planned orthogonal comparisons

    Small numbers of planned orthogonal pairwise comparisons can be done using the Wilcoxon-Mann-Whitney test. For larger numbers of such comparisons, the Dunn-Sidak correction should be applied. If variances are not homogeneous, there is a 'robust' version of the Wilcoxon-Mann-Whitney test known as the Fligner-Policello test. However, it assumes that distributions are symmetrical - which rather limits its usefulness.


All pairwise comparisons

    Joint or pairwise ranking

    In joint rank tests, the mean ranks (or rank sums) used in the Kruskal-Wallis tests are compared. These tests are therefore different in nature to parametric multiple comparison tests because the significance of a comparison between a pair of treatments depends upon observations from treatments not involved in the comparison. Hence results may change depending on the number of treatments being considered.

    In pairwise ranking ranks are assigned afresh just to the two treatments being compared. This has the disadvantage that cycling can arise where group A is greater than group B and group C is greater than group A, but group C is not significantly greater than group B. Such inconsistencies are difficult to explain logically!


    Joint rank tests

    The simplest of these uses a test analogous to Tukey's test and is known as the Nemenyi joint rank test. Differences between the rank sums of each group are compared to a single honestly significant difference calculated as below:

    Algebraically speaking -

    HSD    =    qα(k,df=∞)
    n (nk)(nk+1)
    • HSD is the honestly significant difference between rank sums,
    • qα(k,df=∞) is the Studentized range statistic,
    • k is the number of groups,
    • n is the number of observations in each group.

    It assumes that there is the same number of replicates in each group.

    For unequal sample size one can use the Dunn test. In this test one compare mean ranks, not sums of ranks. Consequently a different range statistic is used for the test.

    Algebraically speaking -

    HSD    =    Qα(k)
    N (N + 1) ( 1 + 1 )
    12 nA nB
    • HSD is the honestly significant difference between mean ranks,
    • Qα(k) is the mean rank range statistic,
    • k is the number of groups,
    • N is the total number of observations,
    • nA and nB are the number of observations in the two groups being compared.


    Pairwise tests

    The Steel-Dwass test is the frequently recommended pairwise ranking test. Each pair of treatments is compared with the Wilcoxon-Mann-Whitney test. For small samples (n = 2-6) and only (k =) 3 groups, convert the calculated U-statistic to the minimum rank sum and compare it with the exact critical values given in Steel (1960). Otherwise convert the calculated U to the maximum rank sum and compare it with the large sample approximation given in Steel (1961).