InfluentialPoints.com Biology, images, analysis, design... 

"It has long been an axiom of mine that the little things are infinitely the most important" 

The WilcoxonMannWhitney Utest: Use & misuse(versus ttest, similarity of distributions, reported measure of location, small samples, tied data)Statistics courses, especially for biologists, assume formulae = understanding and teach how to do statistics, but largely ignore what those procedures assume, and how their results mislead when those assumptions are unreasonable. The resulting misuse is, shall we say, predictable... Use and MisuseThe WilcoxonMannWhitney test is widely used in all disciplines, probably nearly as much as the ubiquitous ttest. Despite its lower power, it is often favoured over the ttest because of the misconception that no assumptions have to be met for the test to be valid. In fact the basic assumptions of the two tests (namely that both samples are random samples and are mutually independent) are identical. Not surprisingly, therefore, we find similar misuse as with the ttest concerning these aspects. But the biggest problems come where the assumptions do differ from those of the ttest  namely the distribution of the data. If the test is to be used to compare arithmetic means, the two distributions must be both symmetrical and identical apart from location. If the test is to be used to compare medians, the two distributions must be considered as identical apart from location. Yet we give a number of examples where distributions were clearly different, yet a significant result was still assumed to indicate either a difference in means or a difference in medians. In such situations the test is still valid to test for dominance of one distribution over another  but few researchers seem to be aware that that is what is being tested. If it is actually medians and/or distributions that are being compared, then reporting mean and standard error is clearly inappropriate for this test. Although most medical researchers are now aware of this, the practice is still widespread in other disciplines. But there is one special situation where both the arithmetic mean and the median/distribution are of interest  namely where the total is of importance. This is because only the arithmetic mean is directly related to the total. We give two examples were this might be the case  costs of care and duration of disturbance to endangered mammals. In this situation the WilcoxonMannWhitney test may be appropriate to compare distributions, but only a randomization test can adequately compare the arithmetic means. Other misuses relate to the problems of small samples and tied data. There is an exact test for small samples, but this is only valid if there are few or no ties within or between groups. The test is sometimes applied to heavily tied data which makes the test too liberal in reporting differences. We also find examples where use of the normal approximation is borderline for the sample sizes used. A confidence interval is sometimes attached to the median difference, but this is rarely done except in medical research. This is a pity, because estimation of magnitude of the treatment effect should be a primary component of any statistical analysis. We give a few examples of another test, the median test, although it is now rarely used. This is a pity because it is less susceptible to differences in distributions, and hence more readily interpretable in terms of differences between medians. Surprisingly, the few examples we have included make the rather obvious error of reporting arithmetic means and standard errors. This can be wildly misleading if distributions are skewed  as the name suggests, the median test compares ... medians! What the statisticians sayConover (1999) covers the WilcoxonMannWhitney as the MannWhitney test, although he only gives details on the (Wilcoxon) sumof ranks statistic. Table values of W for n_{A},n_{B} up to 20 are given. Sprent (1998) provides a comprehensive treatment of rank tests of location for two independent samples in Chapter 4. Hollander & Wolfe (1973) and Siegel (1956) both cover the WilcoxonMannWhitney test in their texts on nonparametric statistics.Okeh (2009) reviews the application of the Wilcoxon MannWhitney U test in medical research studies. Zimmerman (2003) warns that the largesample WilcoxonMannWhitney test can be strongly influenced by unequal variances of treatment groups even when sample sizes are equal. Hart (2001) notes that the WilcoxonMannWhitney test is a test of both location and shape  not as most researchers consider it a test of difference between medians. Freidlin & Gastwirth (2000) advocate the retirement of the median test from general use, being replaced by the WilcoxonMannWhitney and related tests. Potvin and Roff (1993) propose more general use of nonparametric tests in ecological research, but Johnson (1995) and Smith (1995) take issue with this point of view. Wilcoxon (1945) first proposed the test for equal sample sizes, and then Mann & Whitney (1947) extended the test to cover different sample sizes. Hodges & Lehmann (1963) discuss the properties of the HodgesLehmann estimator of median difference. Wikipedia (2008) provides a comprehensive account of the WilcoxonMannWhitney test with a useful section of its relation to other tests; the median test and the HodgesLehmann estimator are also covered. Various universities give tables of the Wilcoxon Rank Sum statistic on line.
