 
KruskalWallis ANOVA
Worked example 1
Our first worked example uses data from Cobo et al. (1998). They compared albendazole levels in both the serum and in cysts from patients in three treatment groups. They performed both normality and homogeneity of variance tests (results not reported), but then used KruskalWallis to compare the means on the basis of the small sample size. Multiple comparisons were performed with the WilcoxonMannWhitney test. Data are presented below:
Serum albendazole sulphoxide levels (mg/ml) per patient

Gp A (albendazole 10 mg/kg/day) Mean=50.34 (n=17)
 0.36  0.16  0.65  0.04  0.1  0.3  0.2  0.2  0.48  0.29

0.56  0.30  0.18  0.17  0.09  0.42  0.66   

Gp B (albendazole 20 mg/kg/day) Mean=38.35 (n=10)  0.13  0.45  0.22  0.18  0.11  0.46  0.16  0.38  0.23  0.3

Gp C (albendazole 10 mg/kg/day + praziquantel 25 mg/kg/day) Mean=32.66 (n=20)
 0.49  0.13  0.64  0.39  0.18  0.34  0.19  0.99  1.24  0.53

0.27  0.59  0.24  1.31  0.73  0.24  1.54  1.45  1.59  0.44
  
Draw boxplots and assess shape of distributions
We first examine box plots to assess whether a parametric or nonparametric ANOVA would be more appropriate.
{Fig. 1a}
The plot of the raw data strongly suggests that distributions are nonnormal (at least for B and C) and variances are not homogeneous. However, a log transformation appears to make distributions more symmetrical and variances more similar. Hence the most powerful approach to analyzing these data would be to 1. test homogeneity of variances before and after a log transformation, and 2. if the transformation successfully stabilized variances , carry out a parametric ANOVA, and report results with geometric means.
An alternative approach (which we will do here) is to analyze the data with KruskalWallis ANOVA  but bear in mind that heterogeneous variances will make interpretation of the result more complex. Note that a log transformation would have no effect at all on the value of the KruskalWallis statistic.
Pool groups and rank observations
We can do this quickly using R.
Serum albendazole sulphoxide levels (mg/ml) per patient  Sum of ranks

Gp A (albendazole 10 mg/kg/day) Mean=50.34 (n=17)
 26.0  7.5  39.0  1.0  3.0  23.0  14.5  14.5  33.0  21.0  332.5

36.0  23.0  11.0  9.0  2.0  29.0  40.0   

Gp B (albendazole 20 mg/kg/day) Mean=38.35 (n=10)  5.5  31.0  16.0  11.0  4.0  32.0  7.5  27.0  17.0  23.0  174

Gp C (albendazole 10 mg/kg/day + praziquantel 25 mg/kg/day) Mean=32.66 (n=20)
 34.0  5.5  38.0  28.0  11.0  25.0  13.0  42.0  43.0  35.0  621.5

20.0  37.0  18.5  44.0  41.0  18.5  46.0  45.0  47.0  30.0
  
Note we have also worked out the sum of ranks (S_{i}) for each group.
Calculate the KruskalWallis statistic
We will first calculate the statistic uncorrected for ties.
Using
Correcting for ties, we have six groups of ties producing mean ranks of 5.5 (2 ties),7.5 (2 ties),11 (3 ties),14.5 (2 ties),18.5 (2 ties) and 23 (3 ties). Hence the correction factor is given by:
C
 =
 1
 −
 6 + 6 + 24 + 6 + 6 + 24)
 =
 0.999306


47 ^{3} − 47
  
The corrected value of K is given by:
K = 9.42568 / 0.999306 = 9.4322
The Pvalue for this value can be obtained from the chi square distribution (df=2) . It is 0.00895 so we accept there is a highly significant difference between the groups.
Compare groups
As we pointed out above, how we interpret this difference is complicated by the differences between the variances. Ideally we would use the FlignerPolicello robustrank order test  but even this is not appropriate in this instance becase it requires symmetrical distributions. Instead we will use the nonparametric equivalent of Tukey's honestly significant difference test for unequal numbers of replicates (Dunn's test) and accept that the test may be too liberal. For this we need to compute mean ranks (rather than sum of ranks) for each group by simply dividing the sum of ranks (S_{i}) by the number of observations (n_{i}).
Group  n  Mean  Median  Sum of ranks  Mean rank

Gp B  10  0.2620  0.225  174  17.40

Gp A  17  0.3035  0.290  332.5  19.56

Gp C  20  0.6760  0.510  621.5  31.08
  
The standard errors for comparing each pair of groups are:
SE_{B vs C
}  =
 √
  = 5.3104

47 (48)
 (
 1
 +
 1
 )

  
12
 10
 20
  
SE_{B vs A
}  =
 √
  = 5.4643

47 (48)
 (
 1
 +
 1
 )

  
12
 10
 17
  
SE_{A vs C
}  =
 √
  = 4.5710

47 (48)
 (
 1
 +
 1
 )

  
12
 17
 20
  
Honestly significant differences and actual differences in mean rank (from table above) are therefore:
HSD_{B vs C} = 2.394 × 5.3104 = 12.71 Actual difference = 13.68*
HSD_{B vs A} = 1.960 × 5.4643 = 10.71 Actual difference = 2.16^{ns}
HSD_{A vs C} = 1.960 × 4.5710 = 8.96 Actual difference = 11.52*
Conclusions
The actual differences between levels in groups C and A, and between groups C and B were markedly larger than the honestly significant differences. We therefore accept these differences were unlikely to have arisen by chance. There was also a big difference in variability, with levels in groups C much more variable. Differences in variability are commonly ignored  but they could have important clinical implications (for example very high levels may be toxic). The obseved differences between groups may have resulted from the different treatment regimens  but since allocation was consecutive (not random), they may also have resulted from changes in protocols or in patients over time.
