InfluentialPoints.com
Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

 

 

Wilcoxon matched-pairs signed-ranks test

Worked example I

We take for our first example part of a study by Farkas et al (2001) on the role of Helicobacter pylori infection in hereditary angioneurotic oedema. 19 of 65 patients had H. pylori infection.

The infection was successfully eradicated using combination therapy in 18 of these patients. The impact of eradication was studied in detail in nine of these patients - the data below show the number of episodes in these nine patients over (a median of) 10 months pre-treatment and 10 months post eradication .

Number of oedematous episodes in patients
with hereditary angioneurotic oedema
Patient
number
Pre
-treatment
Post
-treatment
Difference
(d)
Rank Signed
rank (Ri)
1
2
3
4
5
6
7
8
9
13
11
6
14
6
15
13
14
8
1
2
1
1
3
3
2
4
2
12
9
5
13
3
12
11
10
6
7.5
4
2
9
1
7.5
6
5
3
+ 7.5
+ 4
+ 2
+ 9
+ 1
+ 7.5
+ 6
+ 5
+ 3
Sum +ve ranks    45
Sum -ve ranks    0

We first examine the distribution of differences to check whether it is symmetrical. This is obviously difficult with such a small sample, but an obviously skewed distribution might give us rather less confidence in our result.

{Fig. 1}
figmd1.gif

The distribution is clearly not symmetrical, although one could argue that it is not strongly skewed, and that the irregularities simply result from small sample size. We will continue with this test, but bear in mind that conclusions based on a borderline P-value would be unsafe.

Exact method

Given we have a small number of observations, we should use an exact method rather than the normal approximation. S+ and S are 45 and 0 respectively.

Using tables we find that the critical value for n = 9 at P = 0.005 is 2. Since S is less than this we might be tempted to accept the difference as significant at P < 0.005. However, this result is unreliable because the table values are only accurate if there are no ties.

Using

If we have the exact Wilcoxon test which is available in a separate package in R, we can still obtain the correct P-value. This is 0.0039, as quoted by the author of the paper. The estimate of the median difference is 8.75 (95% CI: 5.5 - 12.0). We may conclude that that there was a significant decline in the number of oedematous episodes post treatment.

Normal approximation method

If we did not have the option of doing an exact test which will accept ties , we would have to use the second formulation above
z   =  45   =  2.6679
{7.52 + 42....+ 32)

This gives a two-tailed P-value of 0.007633.

Using

Using R with the Wilcoxon test in the standard statistics package gives the same value. It may seem surprising that the normal approximation test is more liberal than the exact test. This is most likely because the sample size is so small that the normal approximation is unreliable..

Exact confidence limits

To obtain the Hodges-Lehmann estimate of the median difference and its 95% confidence interval, we first arrange the differences (d) from the table above in order:

3   5   6   9   10   11   12   12   13

We then construct a triangular matrix of the Walsh averages, a task most easily achieved in R. The median difference is given by the median of these values which is 9.0. The required number of averages from each end of the array to obtain the (approximate) upper and lower confidence limits is given by the quantile of the Wilcoxon matched-pairs signed-ranks statistic for n observations at P = 0.025 which is 6.
[3][5][6][9][10][11][12][12][13]
[3]3.04.04.56.06.57.07.57.58.0
[5] 5.05.57.07.58.08.58.59.0
[6]  6.07.58.08.59.09.09.5
[9]   9.09.510.010.510.511.0
[10]    10.010.511.011.011.5
[11]     11.011.511.512.0
[12]      12.012.012.5
[12]       12.012.5
[13]        13.0

The 5 highest values in this array are shown in turquoise cells - the 6th is in a red cell and is the upper 95% confidence limit. The 5 lowest values in the array are shown in green cells - the 6th is in a red cell and is the lower 95% confidence limit. We conclude that the 95% confidence interval for the difference is 6 to 12. This interval does not overlap zero in agreement with our earlier significant P-value of 0.0076.

Whilst this study gave us a manageable worked example, the sample size is really too small for the normal approximation.

 

Worked example II

We take for our second example from Ogata & Takeuchi et al (2001) on a trial of a feline pheromone analogue to reduce the frequency of urine marking by cats. The data are given below for the number of markings pre-treatment and one week post-treatment. We first examine the distribution of differences to check whether the distribution is symmetrical.

{Fig. 2}
figmd2.gif

Number of urine markings
No. Pre Post Diff
(d)
Rank Signed
rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
7
12
77
15
31
14
2
63
10
16
6
1
18
9
1
9
9
30
11
13
7
28
4
10
11
3
7
8
6
12
4
13
1
3
17
22
0
10
6
11
22
13
1
8
7
14
6
1
9
3
1
7
7
12
8
14
2
2
2
10
4
0
0
9
6
12
5
10
0
3
14
0
7
2
71
4
9
1
1
55
3
2
0
0
9
6
0
2
2
18
3
-1
5
26
2
0
7
3
7
-1
0
0
-1
3
1
0
3
22
21
9
29
17
23.5
3.5
3.5
28
14
9
*
*
23.5
19
*
9
9
25
14
3.5
18
27
9
*
21
14
21
3.5
*
*
3.5
14
3.5
*
14
26
21
9
29
17
23.5
3.5
3.5
28
14
9
*
*
23.5
19
*
9
9
25
14
-3.5
18
27
9
*
21
14
21
-3.5
*
*
-3.5
14
3.5
*
14
26
Sum of +ve ranks  424.5
Sum of -ve ranks  10.5
Distributions on the left are of the untransformed differences - they are strongly right skewed. A log transformation reduces skew a little, but the effect is disappointingly slight (note we used a log(x+1) transformation here as several of the post-treatment readings were zero). For now we will proceed with the untransformed data - but bear in mind that the conditions for the test are not met, and we may get a misleading result.

We have a moderate number of observations (29 excluding zeroes denoted as * in the table) with many ties, so we use the normal approximation

Using
z   =  414   =  4.487
8515

This gives a two-tailed P-value of 0.000007. This indicates a highly significant treatment effect.

But note that the Hodges-Lehmann estimate of the median difference given by R is only 4.5 (95% confidence interval: 2.5 - 10.0). In other words, although there was a dramatic effect of treatment for a few cats, for most cats the effect was rather slight.

As we pointed out above, the key assumption for the Wilcoxon matched pairs test (symmetrical distribution of differences) is not met. Hence we would actually do better to resort to the sign test which is not affected by the distribution of the differences.

Using

This gives a P-value of 0.000015, not as highly significant as with Wilcoxon's matched pairs test, but a much more justifiable procedure.