Worked example 1
Antibody concentrations 
Whipsnade  Fota 
24.9  535.4  146.0 
30.8  558.8  559.8 
62.0  743.6  742.5 
128.4  798.6  799.2 
148.1  1093.4  1092.1 
159.1  1141.2  1225.7 
348.8  1261.2  1354.5 
348.9  1454.8  6997.8 
457.6   
We will base our first example on a comparison of concentrations of antibody to Aspergillus concentrations in Humboldt's penguins in two wildlife parks. We first looked at these data in Unit 1 in relation to the use of (jittered) dot plots to display frequency distributions. The authors (apparently) analyzed the data using a twosample ttest on log transformed data obtaining a (significant) Pvalue of 0.035. We will apply a WilcoxonMannWhitney test to the untransformed data.
We first look at the distributions for the each group of observations. Unlike with ttests, we are not interested in whether the data follow a normal distribution  only with whether we can assume the two distributions are sufficiently similar in shape to justify considering the test as a test for difference between medians. We have compared distributions using jittered dot plots and plots of cumulative proportions:


Comparison is difficult given the small sample size from Fota, but the cumulative plots are revealing. There are differences in the distributions with that of the Whipsnade group strongly right skewed, and that of the Fota group more uniform (apart from one high 'outlier'). However the main difference does appear to be a shift in location with different median values.
For this example we will use the Mann Whitney Ustatistic. Sample sizes are small (n_{A} = 8; n_{B} = 17) so we cannot use the normal approximation. We will therefore determine the exact Pvalue both from tables and from our software package (R). Note we have already done the first step in the analysis which is to rank each sample.
Direct method
 For each observation in the smaller sample (Fota) we count up the number of observations in the other sample that are less than it.
There are four observations in the Whipsnade group (24.9, 30.8, 62.0, 128.4) that are less than the first observation (146.0) in the Fota group. There are eleven observations in the Whipsnade group that are less than the second observation (559.8) in the Fota group. Continuing this process U_{1} = 4 + 11 + 11 + 13 + 13 + 15 + 16 + 17 = 100
U_{2} = [8 × 17] − 100 = 36.
The Mann Whitney U statistic is the smaller of U_{1} or U_{2}. Hence U = 36.
Indirect method
Combine the two groups of observations in a single ranked series, retaining the information on their group of origin. In the table below the rank is given in brackets after each observation.
24.9 (1)  30.8 (2)  62.0 (3)  128.4 (4)  146.0 (5) 
148.1 (6)  159.1 (7)  348.8 (8)  348.9 (9)  457.6 (10) 
535.4 (11)  558.8 (12)  559.8 (13)  742.5 (14)  743.6 (15) 
798.6 (16)  799.2 (17)  1092.1 (18)  1093.4 (19)  1141.2 (20) 
1225.7 (21)  1261.2 (22)  1354.5 (23)  1454.8 (24)  6997.8 (25) 
The sum of ranks of the smaller group (S_{A}) is 5 + 13 + 14 + 17 + 18 + 21 + 23 + 25 = 136.
U_{1} = [8 × 17] + [(8 × 9)/2] − 136 = 36 and U_{2} = [8 × 17] − 36 = 100
The Mann Whitney U statistic is the smaller of U_{1} or U_{2}. Hence U = 36.
Using
Testing significance of U
Using Siegel's table K the critical value for a two tailed test at
P = 0.05 is 34 and for
P = 0.10 is
41. Hence we may express the significance level as 0.05 <
P < 0.1, in other words not quite significant at the 0.05 level. Alternatively we could look up the precise onetailed
Pvalue in
R which gives 0.03285. This is doubled to get the twotailed value:
P =
0.0657. Hence we conclude that there is no significant difference between antibody levels at the (conventional)
P = 0.05 level.
This was not the conclusion reached by the authors who quite justifiably carried out a ttest on log transformed data. Using the data given here, an equal variance ttest gives a tvalue of 2.0812, df = 23, P = 0.049. Given that assumptions were reasonably well met for both the WilcoxonMannWhitney and the ttest, the difference in inference is probably a reflection of the greater power of the ttest.
Confidence interval of difference between medians
Normally one would not calculate the confidence interval of the difference following a nonsignificant Pvalue  but we will do so here partly to demonstrate the method, and partly because the Pvalue is so close to significance. The first step is to calculate the differences between all possible pairs of values. This can be done manually although we used a function using R to do this for us. This is the array we obtained:
BLS seroreactivity of foreign and local cows 
 146.0  559.8  742.5  799.2  1092.1  1225.7  1354.5  6997.8 
24.9  121.1  534.9  717.6  774.3  1067.2  1200.8  1329.6  6972.9 
30.8  115.2  529.0  711.7  768.4  1061.3  1194.9  1323.7  6967.0 
62.0  84.0  497.8  680.5  737.2  1030.1  1163.7  1292.5  6935.8 
128.4  17.6  431.4  614.1  670.8  963.7  1097.3  1226.1  6869.4 
148.1  2.1  411.7  594.4  651.1  944.0  1077.6  1206.4  6849.7 
159.1  13.1  400.7  583.4  640.1  933.0  1066.6  1195.4  6838.7 
348.8  202.8  211.0  393.7  450.4  743.3  876.9  1005.7  6649.0 
348.9  202.9  210.9  393.6  450.3  743.2  876.8  1005.6  6648.9 
457.6  311.6  102.2  284.9  341.6  634.5  768.1  896.9  6540.2 
535.4  389.4  24.4  207.1  263.8  556.7  690.3  819.1  6462.4 
558.8  412.8  1.0  183.7  240.4  533.3  666.9  795.7  6439.0 
743.6  597.6  183.8  1.1  55.6  348.5  482.1  610.9  6254.2 
798.6  652.6  238.8  56.1  0.6  293.5  427.1  555.9  6199.2 
1093.4  947.4  533.6  350.9  294.2  1.3  132.3  261.1  5904.4 
1141.2  995.2  581.4  398.7  342.0  49.1  84.5  213.3  5856.6 
1261.2  1115.2  701.4  518.7  462.0  169.1  35.5  93.3  5736.6 
1454.8  1308.8  895.0  712.3  655.6  362.7  229.1  100.3  5543.0 
The HodgesLehmann estimate of median difference is obtained as the median of the array of differences given in the table above: This comes to  (450.4+ 482.1)/2 = 466.25.
Then determine the required number of differences from each end of the array to obtain the (approximate) upper and lower confidence limits. This number is given by the quantile of the MannWhitney U statistic for n_{A} and n_{B} observations at P = 0.025 which is 35. Alternatively the required number of differences can be obtained from the quantile of the Wilcoxon W statistic using k = 188  (18×17)/2 = 35.
The 34 highest values in this array are shown in turquoise cells  the 35th is in a red cell and is the upper 95% confidence limit. The 34 lowest values in the array are shown in green cells  the 35th is in a red cell and is the lower 95% confidence limit.
We conclude that the 95% confidence interval for the difference is 963.7 to 1.3. This interval just overlaps zero in agreement with our earlier nonsignificant Pvalue of 0.0657.