InfluentialPoints.com
Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

Search this site

 

 

The Wilcoxon-Mann-Whitney test

 

Worked example 1

We will base our first example on a comparison of concentrations of antibody to Aspergillus concentrations in Humboldt's penguins in two wildlife parks. We first looked at these data in Unit 1 in relation to the use of (jittered) dot plots to display frequency distributions. The authors (apparently) analyzed the data using a two-sample t-test on log transformed data obtaining a (significant) P-value of 0.035. We will apply a Wilcoxon-Mann-Whitney test to the untransformed data.

Antibody concentrations
WhipsnadeFota
24.9535.4146.0
30.8558.8559.8
62.0743.6742.5
128.4798.6799.2
148.11093.41092.1
159.11141.21225.7
348.81261.21354.5
348.91454.86997.8
457.6  

We first look at the distributions for the each group of observations. Unlike with t-tests, we are not interested in whether the data follow a normal distribution - only with whether we can assume the two distributions are sufficiently similar in shape to justify considering the test as a test for difference between medians. We have compared distributions using jittered dot plots and plots of cumulative proportions:

Figme1ab.gif

Comparison is difficult given the small sample size from Fota, but the cumulative plots are revealing. There are differences in the distributions with that of the Whipsnade group strongly right skewed, and that of the Fota group more uniform (apart from one high 'outlier'). However the main difference does appear to be a shift in location with different median values.

For this example we will use the Mann Whitney U-statistic. Sample sizes are small (nA = 8; nB = 17) so we cannot use the normal approximation. We will therefore determine the exact P-value both from tables and from our software package (R). Note we have already done the first step in the analysis which is to rank each sample.

Direct method

  1. For each observation in the smaller sample (Fota) we count up the number of observations in the other sample that are less than it.
      There are four observations in the Whipsnade group (24.9, 30.8, 62.0, 128.4) that are less than the first observation (146.0) in the Fota group. There are eleven observations in the Whipsnade group that are less than the second observation (559.8) in the Fota group. Continuing this process U1 = 4 + 11 + 11 + 13 + 13 + 15 + 16 + 17 = 100
  2. U2 = [8 × 17] − 100 = 36.

  3. The Mann Whitney U statistic is the smaller of U1 or U2. Hence U = 36.

Indirect method

  1. Combine the two groups of observations in a single ranked series, retaining the information on their group of origin. In the table below the rank is given in brackets after each observation.

    24.9 (1)30.8 (2)62.0 (3)128.4 (4)146.0 (5)
    148.1 (6)159.1 (7)348.8 (8)348.9 (9)457.6 (10)
    535.4 (11)558.8 (12)559.8 (13)742.5 (14)743.6 (15)
    798.6 (16)799.2 (17)1092.1 (18)1093.4 (19)1141.2 (20)
    1225.7 (21)1261.2 (22)1354.5 (23)1454.8 (24)6997.8 (25)

  2. The sum of ranks of the smaller group (SA) is 5 + 13 + 14 + 17 + 18 + 21 + 23 + 25 = 136.

  3. U1 = [8 × 17] + [(8 × 9)/2] − 136 = 36    and    U2 = [8 × 17] − 36 = 100

  4. The Mann Whitney U statistic is the smaller of U1 or U2. Hence U = 36.

Using

Testing significance of U

Using Siegel's table K the critical value for a two tailed test at P = 0.05 is 34 and for P = 0.10 is 41. Hence we may express the significance level as 0.05 < P < 0.1, in other words not quite significant at the 0.05 level. Alternatively we could look up the precise one-tailed P-value in R which gives 0.03285. This is doubled to get the two-tailed value: P = 0.0657. Hence we conclude that there is no significant difference between antibody levels at the (conventional) P = 0.05 level.

This was not the conclusion reached by the authors who quite justifiably carried out a t-test on log transformed data. Using the data given here, an equal variance t-test gives a t-value of -2.0812, df = 23, P = 0.049. Given that assumptions were reasonably well met for both the Wilcoxon-Mann-Whitney and the t-test, the difference in inference is probably a reflection of the greater power of the t-test.

Confidence interval of difference between medians

Normally one would not calculate the confidence interval of the difference following a non-significant P-value - but we will do so here partly to demonstrate the method, and partly because the P-value is so close to significance. The first step is to calculate the differences between all possible pairs of values. This can be done manually although we used a function using R to do this for us. This is the array we obtained:

BLS seroreactivity of foreign and local cows
  146.0 559.8 742.5 799.21092.11225.71354.56997.8
24.9-121.1-534.9-717.6-774.3-1067.2-1200.8-1329.6-6972.9
30.8-115.2-529.0-711.7-768.4-1061.3-1194.9-1323.7-6967.0
62.0-84.0-497.8-680.5-737.2-1030.1-1163.7-1292.5-6935.8
128.4-17.6-431.4-614.1-670.8-963.7-1097.3-1226.1-6869.4
148.12.1-411.7-594.4-651.1-944.0-1077.6-1206.4-6849.7
159.113.1-400.7-583.4-640.1-933.0-1066.6-1195.4-6838.7
348.8202.8-211.0-393.7-450.4-743.3-876.9-1005.7-6649.0
348.9202.9-210.9-393.6-450.3-743.2-876.8-1005.6-6648.9
457.6311.6-102.2-284.9-341.6 -634.5 -768.1 -896.9-6540.2
535.4389.4 -24.4-207.1-263.8 -556.7 -690.3 -819.1-6462.4
558.8412.8 -1.0-183.7-240.4 -533.3 -666.9 -795.7-6439.0
743.6597.6 183.8 1.1 -55.6 -348.5 -482.1 -610.9-6254.2
798.6652.6 238.8 56.1 -0.6 -293.5 -427.1 -555.9-6199.2
1093.4947.4 533.6 350.9 294.21.3 -132.3 -261.1-5904.4
1141.2995.2 581.4 398.7 342.049.1 -84.5 -213.3-5856.6
1261.21115.2 701.4 518.7 462.0 169.1 35.5-93.3-5736.6
1454.81308.8 895.0 712.3 655.6 362.7 229.1 100.3-5543.0

The Hodges-Lehmann estimate of median difference is obtained as the median of the array of differences given in the table above: This comes to - (450.4+ 482.1)/2 = -466.25.

Then determine the required number of differences from each end of the array to obtain the (approximate) upper and lower confidence limits. This number is given by the quantile of the Mann-Whitney U- statistic for nA and nB observations at P = 0.025 which is 35. Alternatively the required number of differences can be obtained from the quantile of the Wilcoxon W- statistic using k = 188 - (18×17)/2 = 35.

The 34 highest values in this array are shown in turquoise cells - the 35th is in a red cell and is the upper 95% confidence limit. The 34 lowest values in the array are shown in green cells - the 35th is in a red cell and is the lower 95% confidence limit.

We conclude that the 95% confidence interval for the difference is -963.7 to 1.3. This interval just overlaps zero in agreement with our earlier non-significant P-value of 0.0657.

 

 

Worked example 2

Our second worked example uses data from a trial on the efficacy of breast feeding for pain relief during venepuncture in newly born infants carried out by Carbajal et al. (2003). We first met this work in a hands-on session. We consider the same comparison here: group A where infants were breast fed during venepuncture and group B where they were just held in their mother's arms. However, we use a different response variable to assess pain levels, the premature infant pain profile scale (PIPP).

Results for the trial for the 44 infants in group A and the 45 infants in group B are given below:

Pain scores on PIPP scale for
'breast-fed' (A) and 'mothers arms' (B) infants
Score Freq.(A) Freq.(B) Ranks Mean rank SA SB
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Sum
5
3
3
6
5
5
2
3
5
1
0
3
1
0
1
1
0
0
0
44
0
0
0
1
0
0
0
1
0
4
5
6
3
7
5
5
2
4
2
45
1-5
6-8
9-11
12-18
19-23
24-28
29-30
31-34
35-39
40-44
45-49
50-58
59-62
63-69
70-75
76-81
82-83
84-87
88-89
 
3
7
10
15
21
26
29.5
32.5
37
42
47
54
60.5
66
72.5
78.5
82.5
85.5
88.5
 
15
21
30
90
105
130
59
97.5
185
42
0
162
60.5
0
72.5
78.5
0
0
0
1148
0
0
0
15
0
0
0
32.5
0
168
235
324
181.5
462
362.5
392.5
165
342
177
2857

Pain scale is an ordinal variable, so the arithmetic mean is not an appropriate measure of location. Hence we use the Wilcoxon-Mann-Whitney test to compare medians. But in order to compare medians, we must first demonstrate that distributions are similar apart from location. We have compared distributions in the two figures below: first using dot plots and then plotting cumulative distributions.

Figme2ab.gif

There are slight differences in skew (group A are slightly right skewed, whereas group B are slightly left skewed), but the main difference between the two distributions is between their locations. We therefore proceed with a Wilcoxon-Mann-Whitney test. Both nA and nB are greater than 20, so we use the normal approximation:

z   =    1148 - 44 (90)/2    =   6.842
44 x 45 238673.3 -  (90)244 x 45
89(88)4(88)

This gives a P-value of 0.00000000000781

Using

The wilcox.test function of Base R gives a very similar P-value, indicating a highly significant treatment effect (but warns this P-value is approximate because of ties). This function also gave the observed Hodges-Lehmann difference (−8.0) with its 95% confidence limits (−6 ,−9). We may conclude that breast feeding is associated with a highly significant reduction in pain (assessed on the PIPP scale) relative to that experienced when the infant is just held in the mother's arms.