Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
The Wilcoxon-Mann-Whitney test
Worked example 1
We will base our first example on a comparison of concentrations of antibody to Aspergillus concentrations in Humboldt's penguins in two wildlife
We first look at the distributions for the each group of observations. Unlike with t-tests, we are not interested in whether the data follow a normal distribution - only with whether we can assume the two distributions are sufficiently similar in shape to justify considering the test as a test for difference between medians. We have compared distributions using jittered dot plots and plots of cumulative proportions:
Comparison is difficult given the small sample size from Fota, but the cumulative plots are revealing. There are differences in the distributions with that of the Whipsnade group strongly right skewed, and that of the Fota group more uniform (apart from one high 'outlier'). However the main difference does appear to be a shift in location with different median values.
For this example we will use the Mann Whitney U-statistic. Sample sizes are small (nA = 8; nB = 17) so we cannot use the normal approximation. We will therefore determine the exact P-value both from tables and from our software package (R). Note we have already done the first step in the analysis which is to rank each sample.
Testing significance of U
Using Siegel's table K the critical value for a two tailed test at P = 0.05 is 34 and for P = 0.10 is
This was not the conclusion reached by the authors who quite justifiably carried out a t-test on log transformed data. Using the data given here, an equal variance t-test gives a t-value of -2.0812, df = 23, P = 0.049. Given that assumptions were reasonably well met for both the Wilcoxon-Mann-Whitney and the t-test, the difference in inference is probably a reflection of the greater power of the t-test.
Confidence interval of difference between medians
Normally one would not calculate the confidence interval of the difference following a non-significant P-value - but we will do so here partly to demonstrate the method, and partly because the P-value is so close to significance. The first step is to calculate the differences between all possible pairs of values. This can be done manually although we used a function using
Then determine the required number of differences from each end of the array to obtain the (approximate) upper and lower confidence limits. This number is given by the quantile of the Mann-Whitney U- statistic for nA and nB observations at P = 0.025 which is 35. Alternatively the required number of differences can be obtained from the quantile of the Wilcoxon W- statistic using k = 188 - (18×17)/2 =
The 34 highest values in this array are shown in turquoise cells - the 35th is in a red cell and is the upper 95% confidence limit. The 34 lowest values in the array are shown in green cells - the 35th is in a red cell and is the lower 95% confidence limit.
Worked example 2
Our second worked example uses data from a trial on the efficacy of breast feeding for pain relief during venepuncture in newly born infants carried out by Carbajal et al. (2003). We first met this work in a hands-on
Results for the trial for the 44 infants in group A and the 45 infants in group B are given below:
Pain scale is an ordinal variable, so the arithmetic mean is not an appropriate measure of location. Hence we use the Wilcoxon-Mann-Whitney test to compare medians. But in order to compare medians, we must first demonstrate that distributions are similar apart from location. We have compared distributions in the two figures below: first using dot plots and then plotting cumulative distributions.
There are slight differences in skew (group A are slightly right skewed, whereas group B are slightly left skewed), but the main difference between the two distributions is between their locations. We therefore proceed with a Wilcoxon-Mann-Whitney test. Both nA and nB are greater than 20, so we use the normal approximation:
The wilcox.test function of Base R gives a very similar P-value, indicating a highly significant treatment effect (but warns this P-value is approximate because of ties). This function also gave the observed Hodges-Lehmann difference (−8.0) with its 95% confidence limits (−6 ,−9). We may conclude that breast feeding is associated with a highly significant reduction in pain (assessed on the PIPP scale) relative to that experienced when the infant is just held in the mother's arms.