Purpose
The WilcoxonMannWhitney (WMW) test is used for assessing whether two samples of observations come from the same distribution, and given certain assumptions, have the same median. In many situations, this test has important advantages 
 It is valid for either ordinal or measurement variables, including derived variables.
 It is reasonably powerful. For large samples, the WilcoxonMannWhitney test approaches 95.5% (or 3/π) of the power of a ttest whose assumptions are met.
 It makes fewer assumptions than a comparable, parametric test, and is more robust where those assumptions are violated.
 It may require substantially less computation.
The null hypothesis under test is that observations are randomly drawn from the same population but, being a conditional test, this null population is comprised of the pooled samples (A&B)  where observations are taken without replacement. The null hypothesis can also be taken that the HodgesLehmann (HL) statistic is zero. The HL statistic is the median of all possible differences between observations in sample A and sample B. Note that the median difference is not the same as the difference between medians, as can be seen from some of the worked examples below.
The WMW test is most sensitive to differences in medians, but should only be considered as a test for difference between medians if the two distributions are very similar  apart from, under H_{A}, their locations. If observations are symmetrically distributed (e.g. normal) this is also equivalent to a test between means  but when data are normal a ttest is preferred, as it is more powerful. The WMW test's alternate hypothesis is the HodgesLehmann statistic is nonzero which, applying testinversion principles, enables confidence intervals to be estimated for its observed value  although they only approximate to nominal 95% intervals.
These tests of location are effectively t tests of ranktransformed data. However, due to the properties of ranks, testing their sum can entail radically fewer calculations than testing a mean. Where the assumptions are not met for a parametric test, a WilcoxonMannWhitney test is more robust and nearly as powerful.
Because 
 Although the sum of ranks is unaffected by outlying observations, it weights observations rather than merely counting how many lie above and below the median under test. Therefore, unless your observations are highly skewed, their sum of ranks is approximately normal for quite modest sample sizes.
 The sum of ranks is much more sensitive to differences in location than of variance.
The test was originally proposed by Wilcoxon (1945) and then modified to allow for different sample sizes by Mann & Whitney (1947). There are two commonly used equivalent statistics: the Wilcoxon sum of ranks (S)statistic and the MannWhitney Ustatistic. Unfortunately these terms are not used consistently in the literature, and R uses W to indicate the MannWhitney Ustatistic.
Procedure
Wilcoxon sum of ranks Sstatistic
 Calculating the statistic
First combine the observations in the two samples into a single pooled 'null' sample, retaining the information on the source of each observation. Then ranktransform the observations. The test statistic is simply the sum of ranks of one of the samples (S_{A} or S_{B}) within the combined sample (A&B).
Algebraically speaking 
Where:
 S_{A} is the sum of ranks of the observations of sample A in the combined sample,
 R(A_{i}) are the ranks of the observations of sample A within the combined sample,
 A_{i} is the ith value of sample A.

If you are using tables of critical values, use the following procedure to decide which to use. If the number of replicates in each sample is the same, the smallest of S_{A} and S_{B} should be used as the test statistic in the table. If the number of replicates in each sample is not the same, calculate the sum of ranks for the sample with fewer observations, say S_{A}. Then calculate S_{B} = n_{A}(n_{A}+n_{B}+1)  S_{A}. Use the smaller of S_{A} and S_{B} as the test statistic.
Testing the significance of S
The sum of ranks of one sample can be tested directly using Table A7 in Conover (1999), or any of the tables of Wilcoxon's sum of ranks statistic available on the web.
MannWhitney Ustatistic
 Direct method
This formulation which gives the MannWhitney Ustatistic was quite popular in the precomputer age, because it requires less arithmetic. It can be worked out without assigning any ranks to the data, although the two data sets should each be arranged separately in ascending order. The statistic U_{1} is obtained by simply adding up the number of times each observation in sample A is exceeded by an observation in sample B. If one sample has fewer observations it is convenient (but not essential) to denote that as sample A. U_{2} is then given by (n_{A}n_{B} − U_{1}). The direct method is only really appropriate for small sample sizes.
Indirect method
For large sample sizes it is easier to use the indirect method of calculating U_{1} and U_{2}.
Algebraically speaking 
U_{1} 
= 
S_{A}  
n_{A} (n_{A} + 1)


2 
U_{2} 
= 
S_{B}  
n_{B} (n_{B} + 1)


2 
Where: 
Testing the significance of U
This value may be looked up in tables of U  for example that provided by Siegel (1956) The calculated Ustatistic is significant if it is less than the tabulated value. Alternatively we can readily obtain the exact 1tailed Pvalue using R.
Normal approximation
The normal approximation can be used if n_{A} or n_{B} are greater than 20. It was also used in the past if there were ties because published tables were only exact for untied data. However modern software often provide exact tests when there are ties in the data.
A one sample test is used in which the expected sum of ranks of one sample under the null hypothesis is subtracted from the observed sum of ranks, and divided by the expected standard deviation of the sum of ranks. The number of ties determines which is the appropriate formula to use for estimating the expected standard deviation:
Algebraically speaking 
For no ties 
z = 
S_{A}  n_{A}(N+1)/2 

√ 

{n_{A}n_{B}(N+1)}/12 
For when ties are present 
z = 
S_{A}  n_{A}(N+1)/2 

√ 

n_{A}n_{B}  Σ 
R^{2} − 
(N + 1)^{2}n_{A}n_{B} 


 N(N1)  4(N1) 
Where:
 z is tested against the standard normal deviate Z
 S_{A} is the sum of ranks for sample A,
 n_{A} and n_{B} are the number of observations in samples A and B respectively,
 N the total number of observations, n_{A} + n_{B}
 ΣR^{2} is the sum of all the squared ranks.

Confidence interval to the HodgesLehmann estimate of the median difference
The first step is to calculate the differences between all possible pairs of values. This is best done by ordering each sample from smallest to largest and then forming a matrix of differences (A_{i}  B_{i}) using the values of A as columns and the values of B as rows. The HodgesLehmann estimate of the median difference is given by the median of these differences. The upper and lower 95% confidence limits to this median are obtained by counting in a specified number of differences from each end of the array.
If you are using the Mann Whitney Ustatistic, the required number of differences from each end of the array is given by the quantile of the Mann Whitney Ustatistic for n_{A} and n_{B} observations at P = 0.025.
If you are using the Wilcoxon Wstatistic, the required number of differences from each end of the array is given by:
Algebraically speaking 
k = W_{(α/2)} − [n_{A} (n_{A} − 1)] / 2
where:
 k is the required number of differences from each end of the array to obtain the upper and lower confidence limits,
 W_{(α/2)} is the quantile of the Wilcoxon rank sum statistic for n_{A} and n_{B} observations. Hence for a 95% confidence interval use P = 0.025.
 n_{A} and n_{B} are the number of observations in groups A and B respectively.

Assumptions
Being a nonparametric test, the WilcoxonMannWhitney test is often assumed to have no distributional assumptions. This is only true if it is being used as a test of dominance. If it is being used to compare medians, then there is an important distributional assumption. We will start with the two usual assumptions for tests comparing two independent samples:
 Both samples are random samples
 The two samples are mutually independent
 The measurement scale is at least ordinal
 If the test is used as a test of dominance, it has no distributional assumptions. If it used to compare medians, the two distributions must be identical apart from their locations. If it used to compare means, those two distributions must also be symmetrical.
Related
topics : 
Median test

