InfluentialPoints.com
Biology, images, analysis, design...
 Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

# Wilcoxon matched-pairs signed-ranks test

#### Worked example I

We take for our first example part of a study by Farkas et al (2001) on the role of Helicobacter pylori infection in hereditary angioneurotic oedema. 19 of 65 patients had H. pylori infection.

The infection was successfully eradicated using combination therapy in 18 of these patients. The impact of eradication was studied in detail in nine of these patients - the data below show the number of episodes in these nine patients over (a median of) 10 months pre-treatment and 10 months post eradication .

 Number of oedematous episodes in patients with hereditary angioneurotic oedema Patient number Pre-treatment Post-treatment Difference(d) Rank Signedrank (Ri) 123456789 131161461513148 121133242 12951331211106 7.542917.5653 + 7.5+ 4+ 2+ 9+ 1+ 7.5+ 6+ 5+ 3 Sum +ve ranks 45 Sum -ve ranks 0

We first examine the distribution of differences to check whether it is symmetrical. This is obviously difficult with such a small sample, but an obviously skewed distribution might give us rather less confidence in our result.

{Fig. 1}

The distribution is clearly not symmetrical, although one could argue that it is not strongly skewed, and that the irregularities simply result from small sample size. We will continue with this test, but bear in mind that conclusions based on a borderline P-value would be unsafe.

#### Exact method

Given we have a small number of observations, we should use an exact method rather than the normal approximation. S+ and S are 45 and 0 respectively.

Using tables we find that the critical value for n = 9 at P = 0.005 is 2. Since S is less than this we might be tempted to accept the difference as significant at P < 0.005. However, this result is unreliable because the table values are only accurate if there are no ties.

Using

If we have the exact Wilcoxon test which is available in a separate package in R, we can still obtain the correct P-value. This is 0.0039, as quoted by the author of the paper. The estimate of the median difference is 8.75 (95% CI: 5.5 - 12.0). We may conclude that that there was a significant decline in the number of oedematous episodes post treatment.

#### Normal approximation method

If we did not have the option of doing an exact test which will accept ties , we would have to use the second formulation above
 z = 45 = 2.6679 √{7.52 + 42....+ 32)

This gives a two-tailed P-value of 0.007633.

Using

Using R with the Wilcoxon test in the standard statistics package gives the same value. It may seem surprising that the normal approximation test is more liberal than the exact test. This is most likely because the sample size is so small that the normal approximation is unreliable..

#### Exact confidence limits

To obtain the Hodges-Lehmann estimate of the median difference and its 95% confidence interval, we first arrange the differences (d) from the table above in order:

3   5   6   9   10   11   12   12   13

We then construct a triangular matrix of the Walsh averages, a task most easily achieved in R. The median difference is given by the median of these values which is 9.0. The required number of averages from each end of the array to obtain the (approximate) upper and lower confidence limits is given by the quantile of the Wilcoxon matched-pairs signed-ranks statistic for n observations at P = 0.025 which is 6.
 [3] [5] [6] [9] [10] [11] [12] [12] [13] [3] 3.0 4.0 4.5 6.0 6.5 7.0 7.5 7.5 8.0 [5] 5.0 5.5 7.0 7.5 8.0 8.5 8.5 9.0 [6] 6.0 7.5 8.0 8.5 9.0 9.0 9.5 [9] 9.0 9.5 10.0 10.5 10.5 11.0 [10] 10.0 10.5 11.0 11.0 11.5 [11] 11.0 11.5 11.5 12.0 [12] 12.0 12.0 12.5 [12] 12.0 12.5 [13] 13.0

The 5 highest values in this array are shown in turquoise cells - the 6th is in a red cell and is the upper 95% confidence limit. The 5 lowest values in the array are shown in green cells - the 6th is in a red cell and is the lower 95% confidence limit. We conclude that the 95% confidence interval for the difference is 6 to 12. This interval does not overlap zero in agreement with our earlier significant P-value of 0.0076.

Whilst this study gave us a manageable worked example, the sample size is really too small for the normal approximation.

#### Worked example II

We take for our second example from Ogata & Takeuchi et al (2001) on a trial of a feline pheromone analogue to reduce the frequency of urine marking by cats. The data are given below for the number of markings pre-treatment and one week post-treatment. We first examine the distribution of differences to check whether the distribution is symmetrical.

{Fig. 2}

 Number of urine markings No. Pre Post Diff(d) Rank Signedrank 123456789101112131415161718192021222324 252627282930313233343536 71277153114263101661189199301113728410 11378612413131722 010611221318714619317712814222104 00961251003140 7271491155320096022183-152620737-100-1310322 219291723.53.53.528149**23.519*9925143.518279*2114213.5**3.5143.5*1426 219291723.53.53.528149**23.519*992514-3.518279*211421-3.5**-3.5143.5*1426 Sum of +ve ranks 424.5 Sum of -ve ranks 10.5
Distributions on the left are of the untransformed differences - they are strongly right skewed. A log transformation reduces skew a little, but the effect is disappointingly slight (note we used a log(x+1) transformation here as several of the post-treatment readings were zero). For now we will proceed with the untransformed data - but bear in mind that the conditions for the test are not met, and we may get a misleading result.

We have a moderate number of observations (29 excluding zeroes denoted as * in the table) with many ties, so we use the normal approximation

Using
 z = 414 = 4.487 √8515

This gives a two-tailed P-value of 0.000007. This indicates a highly significant treatment effect.

But note that the Hodges-Lehmann estimate of the median difference given by R is only 4.5 (95% confidence interval: 2.5 - 10.0). In other words, although there was a dramatic effect of treatment for a few cats, for most cats the effect was rather slight.

As we pointed out above, the key assumption for the Wilcoxon matched pairs test (symmetrical distribution of differences) is not met. Hence we would actually do better to resort to the sign test which is not affected by the distribution of the differences.

Using

This gives a P-value of 0.000015, not as highly significant as with Wilcoxon's matched pairs test, but a much more justifiable procedure.

 Except where otherwise specified, all text and images on this page are copyright InfluentialPoints, all rights reserved. Images not copyright InfluentialPoints credit their source on web-pages attached via hypertext links from those images.