Worked example 1
We will base our first example on a comparison of concentrations of antibody to Aspergillus concentrations in Humboldt's penguins in two wildlife parks.
We first looked at these data in Unit 1 in relation to the use of (jittered) dot plots to display frequency distributions.
The authors (apparently) analyzed the data using a two-sample t-test on log transformed data obtaining a (significant) P-value of 0.035. We will apply a Wilcoxon-Mann-Whitney test to the untransformed data.
Antibody concentrations |
Whipsnade | Fota |
24.9 | 535.4 | 146.0 |
30.8 | 558.8 | 559.8 |
62.0 | 743.6 | 742.5 |
128.4 | 798.6 | 799.2 |
148.1 | 1093.4 | 1092.1 |
159.1 | 1141.2 | 1225.7 |
348.8 | 1261.2 | 1354.5 |
348.9 | 1454.8 | 6997.8 |
457.6 | | |
| |
We first look at the distributions for the each group of observations. Unlike with t-tests, we are not interested in whether the data follow a normal distribution - only with whether we can assume the two distributions are sufficiently similar in shape to justify considering the test as a test for difference between medians. We have compared distributions using jittered dot plots and plots of cumulative proportions:
Comparison is difficult given the small sample size from Fota, but the cumulative plots are revealing. There are differences in the distributions with that of the Whipsnade group strongly right skewed, and that of the Fota group more uniform (apart from one high 'outlier'). However the main difference does appear to be a shift in location with different median values.
For this example we will use the Mann Whitney U-statistic. Sample sizes are small (nA = 8; nB = 17) so we cannot use the normal approximation. We will therefore determine the exact P-value both from tables and from our software package (R). Note we have already done the first step in the analysis which is to rank each sample.
Direct method
- For each observation in the smaller sample (Fota) we count up the number of observations in the other sample that are less than it.
There are four observations in the Whipsnade group (24.9, 30.8, 62.0, 128.4) that are less than the first observation (146.0) in the Fota group. There are eleven observations in the Whipsnade group that are less than the second observation (559.8) in the Fota group. Continuing this process U1 = 4 + 11 + 11 + 13 + 13 + 15 + 16 + 17 = 100
U2
= [8 × 17] − 100 = 36.
The Mann Whitney U statistic is the smaller of U1 or U2. Hence U = 36.
Indirect method
Combine the two groups of observations in a single ranked series, retaining the information on their group of origin. In the table below the rank is given in brackets after each observation.
24.9 (1) | 30.8 (2) | 62.0 (3) | 128.4 (4) | 146.0 (5) |
148.1 (6) | 159.1 (7) | 348.8 (8) | 348.9 (9) | 457.6 (10) |
535.4 (11) | 558.8 (12) | 559.8 (13) | 742.5 (14) | 743.6 (15) |
798.6 (16) | 799.2 (17) | 1092.1 (18) | 1093.4 (19) | 1141.2 (20) |
1225.7 (21) | 1261.2 (22) | 1354.5 (23) | 1454.8 (24) | 6997.8 (25) |
| |
The sum of ranks of the smaller group (SA) is 5 + 13 + 14 + 17 + 18 + 21 + 23 + 25 = 136.
U1
= [8 × 17] + [(8 × 9)/2] − 136 = 36 and U2
= [8 × 17] − 36 = 100
The Mann Whitney U statistic is the smaller of U1 or U2. Hence U = 36.

Using
Testing significance of U
Using Siegel's table K the critical value for a two tailed test at P = 0.05 is 34 and for P = 0.10 is 41.
Hence we may express the significance level as 0.05 < P < 0.1, in other words not quite significant at the 0.05 level. Alternatively we could look up the precise one-tailed P-value in R
which gives 0.03285. This is doubled to get the two-tailed value: P = 0.0657. Hence we conclude that there is no significant difference between antibody levels at the (conventional) P = 0.05 level.
This was not the conclusion reached by the authors who quite justifiably carried out a t-test on log transformed data. Using the data given here, an equal variance t-test gives a t-value of -2.0812, df = 23, P = 0.049. Given that assumptions were reasonably well met for both the Wilcoxon-Mann-Whitney and the t-test, the difference in inference is probably a reflection of the greater power of the t-test.
Confidence interval of difference between medians
Normally one would not calculate the confidence interval of the difference following a non-significant P-value - but we will do so here partly to demonstrate the method, and partly because the P-value is so close to significance. The first step is to calculate the differences between all possible pairs of values. This can be done manually although we used a function using R
to do this for us. This is the array we obtained:
BLS seroreactivity of foreign and local cows |
| 146.0 | 559.8 | 742.5 | 799.2 | 1092.1 | 1225.7 | 1354.5 | 6997.8 |
24.9 | -121.1 | -534.9 | -717.6 | -774.3 | -1067.2 | -1200.8 | -1329.6 | -6972.9 |
30.8 | -115.2 | -529.0 | -711.7 | -768.4 | -1061.3 | -1194.9 | -1323.7 | -6967.0 |
62.0 | -84.0 | -497.8 | -680.5 | -737.2 | -1030.1 | -1163.7 | -1292.5 | -6935.8 |
128.4 | -17.6 | -431.4 | -614.1 | -670.8 | -963.7 | -1097.3 | -1226.1 | -6869.4 |
148.1 | 2.1 | -411.7 | -594.4 | -651.1 | -944.0 | -1077.6 | -1206.4 | -6849.7 |
159.1 | 13.1 | -400.7 | -583.4 | -640.1 | -933.0 | -1066.6 | -1195.4 | -6838.7 |
348.8 | 202.8 | -211.0 | -393.7 | -450.4 | -743.3 | -876.9 | -1005.7 | -6649.0 |
348.9 | 202.9 | -210.9 | -393.6 | -450.3 | -743.2 | -876.8 | -1005.6 | -6648.9 |
457.6 | 311.6 | -102.2 | -284.9 | -341.6 | -634.5 | -768.1 | -896.9 | -6540.2 |
535.4 | 389.4 | -24.4 | -207.1 | -263.8 | -556.7 | -690.3 | -819.1 | -6462.4 |
558.8 | 412.8 | -1.0 | -183.7 | -240.4 | -533.3 | -666.9 | -795.7 | -6439.0 |
743.6 | 597.6 | 183.8 | 1.1 | -55.6 | -348.5 | -482.1 | -610.9 | -6254.2 |
798.6 | 652.6 | 238.8 | 56.1 | -0.6 | -293.5 | -427.1 | -555.9 | -6199.2 |
1093.4 | 947.4 | 533.6 | 350.9 | 294.2 | 1.3 | -132.3 | -261.1 | -5904.4 |
1141.2 | 995.2 | 581.4 | 398.7 | 342.0 | 49.1 | -84.5 | -213.3 | -5856.6 |
1261.2 | 1115.2 | 701.4 | 518.7 | 462.0 | 169.1 | 35.5 | -93.3 | -5736.6 |
1454.8 | 1308.8 | 895.0 | 712.3 | 655.6 | 362.7 | 229.1 | 100.3 | -5543.0 |
| |
The Hodges-Lehmann estimate of median difference is obtained as the median of the array of differences given in the table above:
This comes to - (450.4+ 482.1)/2 = -466.25.
Then determine the required number of differences from each end of the array to obtain the (approximate) upper and lower confidence limits. This number is given by the quantile of the Mann-Whitney U- statistic for nA and nB observations at P = 0.025
which is 35. Alternatively the required number of differences can be obtained from the quantile of the Wilcoxon W- statistic using k = 188 - (18×17)/2 = 35.
The 34 highest values in this array are shown in turquoise cells - the 35th is in a red cell and is the upper 95% confidence limit. The 34 lowest values in the array are shown in green cells - the 35th is in a red cell and is the lower 95% confidence limit.
We conclude that the 95% confidence interval for the difference is -963.7 to 1.3.
This interval just overlaps zero in agreement with our earlier non-significant P-value of 0.0657.