Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

Search this site



Wilcoxon matched pairs signed rank test: Use & misuse

(versus paired t-test, distribution of differences, symmetricality, power)

Statistics courses, especially for biologists, assume formulae = understanding and teach how to do  statistics, but largely ignore what those procedures assume,  and how their results mislead when those assumptions are unreasonable. The resulting misuse is, shall we say, predictable...

Use and Misuse

The Wilcoxon matched-pairs signed-ranks test is a widely used test, replacing the paired t-test  where the assumption of a normal distribution of the differences is unjustified. As with the Wilcoxon-Mann-Whitney test,  it is often favoured over the t-test because of the misconception that no assumptions have to be met for the test to be valid. In fact, the two tests make similar but not identical assumptions. Those assumptions are that paired samples are random and independent, that both variables are measured (effectively) on an interval scale, and (for comparing means/medians) that the distribution of differences is symmetrical. Our examples suggest that the requirement for an interval scale is commonly ignored. Similarly paired samples are often not independent - from our review for example responses of (groups of) squirrels to odours and paired samples taken in a time series of elephant crop raiding incidents.

We then come to checking whether the distribution of differences can be regarded as symmetrical. Few (if any) researchers actually plot the distribution of differences, although some do give information on the two separate distributions (for example median and interquartile range). Symmetry can be inferred if the two separate distributions are similar in shape. The lack of information on distributions is worrying because, in the few cases where we can check the distribution of differences, it is sometimes very skewed. This often results from treatment affecting skew as well as location. Even if the differences are not symmetrical, we can still draw inference concerning the Hodges-Lehmann estimator of median difference, but not concerning the difference between means or medians.

The lower power  of non-parametric tests means that, if conditions are met, one should always use the (more powerful) parametric test. Failing to do this leads one to suspect that sometimes a researcher may be trying to prove the null hypothesis,  rather than trying to disprove it. We give two examples where use of a logarithmic transformation may well have normalized distributions enabling use of the paired t-test. There is also the issue of sample size. You cannot use the Wilcoxon signed ranks test when there are fewer than five pairs simply because it cannot give a significant result - yet we found the test was used with smaller samples in studies on faecal egg counts and fecundities of lemurs.

Lastly there are two common uses of the test that should be discouraged, at least without strong caveats. Firstly, it is not uncommon to find a randomized trial analyzed as a before-and-after study with Wilcoxon's matched pairs test - presumably because differences between treatment and control groups were not significant. Such a test should always be preceded by a comparison between treatment and control - with the non-significant result given explicitly. Secondly the test should never be used to assess agreement between two methods of measurement - we give two examples of this, and point out that two sets of measurements may have the same mean, yet give very different individual readings.

What the statisticians say

Conover (1999) covers the one-sample or matched-pairs signed-ranks test in Chapter 5. A comprehensive account of the test is given, together with estimation of the confidence interval for the median difference. The test is also compared with other procedures. Sprent (1998) provides a comprehensive account of the signed-ranks test including estimation of the confidence interval and the need for logarithmic transformation if the distribution of the differences is skewed. Hollander & Wolfe (1973) also cover the matched-pairs signed-ranks test, whilst Siegel (1956) wrote one of the earliest accounts of the test.

Hodges & Lehmann (1963) proposed the median of the Walsh averages as a point estimator of the median difference in the paired samples case. Wilcoxon (1945) first suggested the signed-ranks test.

Wikipedia (2008) describes the main features of the Wilcoxon matched-pairs signed-ranks test. It also provides a link to an informative note by Darlington (2008) on the Wilcoxon test without assuming symmetry.