InfluentialPoints.com
Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

Search this site

 

 

Principles

The Kruskal-Wallis one-way ANOVA is a non-parametric method for comparing k independent samples. It is roughly equivalent to a parametric one way ANOVA with the data replaced by their ranks. Since ranking is conditional upon your observed values, so is this test.

The null hypothesis is that the k groups were randomly assigned from the same group of ranks - in which case each group is equally likely to obtain values above and below that common mean rank. The alternative hypothesis is that, in addition to this random assignment, two or more groups also differ in their mean rank - in which case, like ANOVA, this test assumes the only difference between samples is their mean rank, and any other differences are due to simple chance.

Provided the original observations are identically distributed this can be interpreted as testing for a difference between medians. But when observations represent very different distributions Kruskal-Wallis is a test of dominance, much as the Wilcoxon-Mann-Whitney test is a test of dominance comparing just two samples. The test statistic is in fact identical to the Wilcoxon-Mann-Whitney statistic in the two-sample case.

Kruskal-Wallis is commonly used as a test of equality of medians or even means. In the latter case, in addition to the distributional assumptions mentioned above, observations are also assumed to be distributed symmetrically.

At the asymptote the null distribution of Kruskal-Wallis statistic approximates to the χ2 distribution with k-1 degrees of freedom. The χ2 distribution generally furnishes a conservative test. For (k =) 3 groups and sample size (ni) less than or equal to 5, the exact distribution should be used. For smallish sets of tied data a randomization approach may be preferable.

Kruskal-Wallis is given in various different forms. We give it in its simplified form for no ties. A correction factor is then applied when ties present.

Procedure

  1. Combine the observations in the k samples into a single pooled 'null' sample, retaining the information on the source of each observation.
  2. Assign ranks to the pooled sample. If two values are the same, they both get the average of the two ranks for which they tie - in other words use mean ranks for tied observations, not sequential ranks.
  3. Calculate the sum of ranks (Si) for each group
  4. Compute the Kruskal-Wallis test statistic (K).

Algebraically speaking -

K    =    [ 12 Σ Si2 ] 3(N + 1)
N(N + 1) ni
where
  • K is the Kruskal-Wallis test statistic which approximates to the χ2 distribution for values of ni greater than 5,
  • N is the total number of observations across all groups,
  • Si is the sum of ranks of observations in the ith sample,
  • ni is the number of observations in group i.

If ties are present, divide K by a correction factor C which is given by:

C    =    1 Σ (ti3ti)
N 3 − N
where
  • ti is the number of tied values within group i that are tied at a particular value; differences are summed over the number of groupings of different tied ranks,
  • N is as above.

If each ni is at least 5, the statistic approximates a chi square distribution with k-1 degrees of freedom. For smaller sample sizes exact critical values are available in Table A8 in Conover (1999) .

 

 

Assumptions

The assumptions of the Kruskal-Wallis test are similar to those for the Wilcoxon-Mann-Whitney test.

  1. Samples are random samples, or allocation to treatment group is random.
  2. The two samples are mutually independent.
  3. The measurement scale is at least ordinal, and the variable is continuous.
  4. If the test is used as a test of dominance, it has no distributional assumptions. If it used to compare medians, the distributions must be similar apart from their locations.
  5. The test is generally considered to be robust to ties. However, if ties are present they should not be concentrated together in one part of the distribution (they should have either a normal or uniform distribution)

Related topics :

Non-parametric multiple comparison tests

ANOVA by randomization