Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)



Kruskal-Wallis ANOVA: Use & misuse

(non-parametric ANOVA, test of dominance, test of medians, distribution of observations)

Statistics courses, especially for biologists, assume formulae = understanding and teach how to do  statistics, but largely ignore what those procedures assume,  and how their results mislead when those assumptions are unreasonable. The resulting misuse is, shall we say, predictable...

Use and Misuse

The Kruskal-Wallis one-way ANOVA is a non-parametric method for comparing k independent samples. It is roughly equivalent to a parametric one way ANOVA  with the data replaced by their ranks. When observations represent very different distributions, it should be regarded as a test of dominance between distributions. If the original observations are identically distributed, it can be interpreted as testing for a difference between medians. If observations are also assumed to be distributed symmetrically, it can be interpreted as testing for a difference between means. There is considerable confusion in the literature over this matter. Some authors state unambiguously that there are no distributional assumptions, others that the homogeneity of variances assumption applies just as for parametric ANOVA. The confusion results from how you interpret a significant result.  If you wish to compare medians or means, then the Kruskal-Wallis test also assumes that observations in each group are identically and independently distributed apart from location. If you can accept inference in terms of dominance of one distribution over another, then there are indeed no distributional assumptions.

Non-parametric analysis of variance is used almost as widely and frequently as parametric ANOVA. Its use is usually justified on the basis that assumptions for parametric ANOVA are not met. This can lead to the over-use of Kruskal-Wallis ANOVA, because in many cases a logarithmic transformation  would normalize the errors. If conditions are met for a parametric test, then using a non-parametric test results in an unwarranted loss of power.  The Kruskal-Wallis test is a better option only if the assumption of (approximate) normality of observations cannot be met, or if one is analyzing an ordinal variable.

The commonest misuse of Kruskal-Wallis is to accept a significant result as indicating a difference between means or medians, even when distributions are wildly different. Such results should only be interpreted in terms of dominance. When distributions are similar, medians should be reported rather than means since they (in the form of mean ranks) are what the test is actually comparing. In fact, box and whisker plots with median, interquartile range, outliers and extremes should be the minimum requirement for reporting results of a Kruskal-Wallis test. Apparently contradictory results may make far more sense if medians had been reported rather than means, as the mean is too sensitive to outliers. Multiple comparisons after a Kruskal-Wallis test are subject to the same constraints as after a parametric ANOVA. Ordered means should not be compared using a simple multiple comparison test  - more appropriate non-parametric methods are available. There is also little point doing multiple comparisons if one is carrying out a random effects ANOVA. The overall 'treatment' effect can be assessed with Kruskal-Wallis, but the added variance component and/or the intraclass correlation coefficient is best obtained using the parametric model.

Several of the examples we found  in the literature failed to meet even the basic assumptions of random sampling and independence. In one case Kruskal-Wallis was misused for repeated measures on the same patients - the non-parametric Friedman test would have been perfectly adequate or (following transformation) a paired t-test. The test is also not appropriate for comparing observations in a time series, or for observations where there is spatial autocorrelation - although we look at one way of coping with the latter problem. Pseudoreplication  is often present - we look at one example where slugs are treated in groups of ten, yet in the analysis each slug is treated as an independent replicates.


What the statisticians say

Hollander & Wolfe (1973) and Conover (1999) cover the Kruskal-Wallis test. Conover stresses the resilience of the test in the presence of ties, and gives critical values (in Table A8) for the exact test for (k = 3) groups and up to (n=5) replicates. Sprent (1998) provides a comprehensive treatment of rank tests of location for two independent samples in Chapter 4. Siegel & Castellan (1988) cover the test, along with a table of exact probabilities for small samples.

Kruskal & Wallis (1952) propose their non-parametric analysis of variance. Day & Quinn (1989) review non-parametric multiple range tests including pairwise tests proposed by Nemenyi (1963), Dunn (1964), and Steel (1960), (1961) . Steel (1959) also gives a test for comparison of treatments with a control. Fligner & Policello (1981) and Neuhauser (2002) look at pairwise comparison tests when variances are unequal.

Orlich gives a concise account of Kruskal-Wallis test and of Dunn's test as implemented by Minitab. Mark Fey & Kevin Clarke discuss the inconsistencies of non-parametric multiple comparison tests. Wikipedia provides a section on the Kruskal-Wallis test.