Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)




The Wilcoxon-Mann-Whitney (WMW) test is used for assessing whether two samples of observations come from the same distribution, and given certain assumptions, have the same median. In many situations, this test has important advantages -

  1. It is valid for either ordinal or measurement variables, including derived variables.
  2. It is reasonably powerful. For large samples, the Wilcoxon-Mann-Whitney test approaches 95.5% (or 3/π) of the power of a t-test whose assumptions are met.
  3. It makes fewer assumptions than a comparable, parametric test, and is more robust where those assumptions are violated.
  4. It may require substantially less computation.

The null hypothesis under test is that observations are randomly drawn from the same population but, being a conditional test, this null population is comprised of the pooled samples (A&B) - where observations are taken without replacement. The null hypothesis can also be taken that the Hodges-Lehmann (HL) statistic is zero. The HL statistic is the median of all possible differences between observations in sample A and sample B. Note that the median difference is not the same as the difference between medians, as can be seen from some of the worked examples below.

The WMW test is most sensitive to differences in medians, but should only be considered as a test for difference between medians if the two distributions are very similar - apart from, under HA, their locations. If observations are symmetrically distributed (e.g. normal) this is also equivalent to a test between means - but when data are normal a t-test is preferred, as it is more powerful. The WMW test's alternate hypothesis is the Hodges-Lehmann statistic is nonzero which, applying test-inversion principles, enables confidence intervals to be estimated for its observed value - although they only approximate to nominal 95% intervals.

These tests of location are effectively t tests of rank-transformed data. However, due to the properties of ranks, testing their sum can entail radically fewer calculations than testing a mean. Where the assumptions are not met for a parametric test, a Wilcoxon-Mann-Whitney test is more robust and nearly as powerful.

    Because -
  1. Although the sum of ranks is unaffected by outlying observations, it weights observations rather than merely counting how many lie above and below the median under test. Therefore, unless your observations are highly skewed, their sum of ranks is approximately normal for quite modest sample sizes.
  2. The sum of ranks is much more sensitive to differences in location than of variance.

The test was originally proposed by Wilcoxon (1945) and then modified to allow for different sample sizes by Mann & Whitney (1947). There are two commonly used equivalent statistics: the Wilcoxon sum of ranks (S)-statistic and the Mann-Whitney U-statistic. Unfortunately these terms are not used consistently in the literature, and R uses W to indicate the Mann-Whitney U-statistic.




Wilcoxon sum of ranks S-statistic

  • Calculating the statistic

    First combine the observations in the two samples into a single pooled 'null' sample, retaining the information on the source of each observation. Then rank-transform the observations. The test statistic is simply the sum of ranks of one of the samples (SA or SB) within the combined sample (A&B).

    Algebraically speaking -

    SA = ΣR(Ai)
    • SA is the sum of ranks of the observations of sample A in the combined sample,
    • R(Ai) are the ranks of the observations of sample A within the combined sample,
    • Ai is the ith value of sample A.

    If you are using tables of critical values, use the following procedure to decide which to use. If the number of replicates in each sample is the same, the smallest of SA and SB should be used as the test statistic in the table. If the number of replicates in each sample is not the same, calculate the sum of ranks for the sample with fewer observations, say SA. Then calculate SB = nA(nA+nB+1) - SA. Use the smaller of SA and SB as the test statistic.

  • Testing the significance of S

    The sum of ranks of one sample can be tested directly using Table A7 in Conover (1999), or any of the tables of Wilcoxon's sum of ranks statistic available on the web.


Mann-Whitney U-statistic

  • Direct method

    This formulation which gives the Mann-Whitney U-statistic was quite popular in the pre-computer age, because it requires less arithmetic. It can be worked out without assigning any ranks to the data, although the two data sets should each be arranged separately in ascending order. The statistic U1 is obtained by simply adding up the number of times each observation in sample A is exceeded by an observation in sample B. If one sample has fewer observations it is convenient (but not essential) to denote that as sample A. U2 is then given by (nAnB − U1). The direct method is only really appropriate for small sample sizes.

  • Indirect method

    For large sample sizes it is easier to use the indirect method of calculating U1 and U2.

    Algebraically speaking -

    U1   =   SA - nA (nA + 1)
    U2   =   SB - nB (nB + 1)
    • U1 and U2 are the two alternative U-statistics
    • nA and nB are the number of observations in samples A and B,
    • SA and SB are the sums of ranks for each sample after pooling into a single sample

      The test statistic (U) is the smaller of U1 or U2. The larger value is denoted by U'. Instead of calculating both U1 and U2 from the formulae above, U2 can be obtained from: U2 = nAnB - U1

  • Testing the significance of U

    This value may be looked up in tables of U - for example that provided by Siegel (1956) The calculated U-statistic is significant if it is less than the tabulated value. Alternatively we can readily obtain the exact 1-tailed P-value using R.


Normal approximation

The normal approximation can be used if nA or nB are greater than 20. It was also used in the past if there were ties because published tables were only exact for untied data. However modern software often provide exact tests when there are ties in the data.

A one sample test is used in which the expected sum of ranks of one sample under the null hypothesis is subtracted from the observed sum of ranks, and divided by the expected standard deviation of the sum of ranks. The number of ties determines which is the appropriate formula to use for estimating the expected standard deviation:

Algebraically speaking -

For no ties -

z =  SA - nA(N+1)/2

For when ties are present -

z = 
SA - nA(N+1)/2
nAnBΣ R2 −  (N + 1)2nAnB
  • z is tested against the standard normal deviate Z
  • SA is the sum of ranks for sample A,
  • nA and nB are the number of observations in samples A and B respectively,
  • N the total number of observations, nA + nB
  • ΣR2 is the sum of all the squared ranks.


Confidence interval to the Hodges-Lehmann estimate of the median difference

The first step is to calculate the differences between all possible pairs of values. This is best done by ordering each sample from smallest to largest and then forming a matrix of differences (Ai - Bi) using the values of A as columns and the values of B as rows. The Hodges-Lehmann estimate of the median difference is given by the median of these differences. The upper and lower 95% confidence limits to this median are obtained by counting in a specified number of differences from each end of the array.

If you are using the Mann Whitney U-statistic, the required number of differences from each end of the array is given by the quantile of the Mann Whitney U-statistic for nA and nB observations at P = 0.025.

If you are using the Wilcoxon W-statistic, the required number of differences from each end of the array is given by:

Algebraically speaking -

k = W(α/2) − [nA (nA − 1)] / 2 where:

  • k is the required number of differences from each end of the array to obtain the upper and lower confidence limits,
  • W(α/2) is the quantile of the Wilcoxon rank sum statistic for nA and nB observations. Hence for a 95% confidence interval use P = 0.025.
  • nA and nB are the number of observations in groups A and B respectively.




Being a non-parametric test, the Wilcoxon-Mann-Whitney test is often assumed to have no distributional assumptions. This is only true if it is being used as a test of dominance. If it is being used to compare medians, then there is an important distributional assumption. We will start with the two usual assumptions for tests comparing two independent samples:

  • Both samples are random samples
  • The two samples are mutually independent
  • The measurement scale is at least ordinal
  • If the test is used as a test of dominance, it has no distributional assumptions. If it used to compare medians, the two distributions must be identical apart from their locations. If it used to compare means, those two distributions must also be symmetrical.

topics :

Median test