Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
Wilcoxon matched-pairs signed-ranks testOn this page: Purpose Procedure Sum of ranks statistic Large sample normal approximation Confidence interval to the median difference Assumptions
The Wilcoxon signed-ranks test is a non-parametric equivalent of the paired t-test. It is most commonly used to test for a difference in the mean (or median) of paired observations - whether measurements on pairs of units or before and after measurements on the same unit. It can also be used as a one-sample test to test whether a particular sample came from a population with a specified median.
Unlike the t-test, the paired differences do not need to follow a normal distribution. But if you wish to test the median (= mean) difference, the distribution each side of the median - must have a similar shape. In other words the distribution of the differences must be symmetrical. If the distribution of the differences is not symmetrical, you can only test the null hypothesis that the Hodges-Lehmann estimate of the median difference is zero. Unlike most rank tests, this test outcome is affected by a transformation before ranking since differences are ranked in order of their absolute size. It may thus be worth plotting the distribution of the differences after an appropriate transformation (for example logarithmic) to see if it makes the distribution appear more symmetrical.
A signed-ranks upon paired samples is less powerful than the t-test (relative efficiency is about 95%) providing the differences are normally distributed. If they are not, and cannot be transformed such that they are, a paired t-test is not appropriate and the non-parametric test should be used.
Sum of ranks statistic (small samples with no ties):
Large sample normal approximation
The large sample approximation is only appropriate for n > 20. However, it is still commonly used for smaller sample sizes if there are ties in the data. This is because table values for exact tests are only valid for untied data. Nowadays software is available (including R) to carry out exact tests even when there are tied data - so inappropriate use of the normal approximation is unjustified.
Where data are tied, if there is no difference between a pair then, although its signed rank (Ri) is zero, its presence increases the rank of all other pairs by one. In which case, whilst E(Ri) remains zero, so ΣRi2 will give a biased estimate of the variance of ΣRi - in which case this method must assume pairs are not tied. Since, this approximation assumes there are few ties between differences, it may be unreliable when applied to strongly discrete data - and most especially if that data is highly skewed.
Confidence interval to the median difference
The differences between pairs of observations are first arranged in rank order. A triangular matrix of the Walsh averages (the means of all possible pairs of values) is then constructed. The Hodges-Lehmann estimate of the median difference is given by the median of these values.
The upper and lower 95% confidence limits to this median are obtained by counting in a specified number of Walsh averages from each end of the array. The required number of averages is given by the quantile of the Wilcoxon matched-pairs signed-ranks statistic for n observations at P = 0.025.