InfluentialPoints.com Biology, images, analysis, design... 

"It has long been an axiom of mine that the little things are infinitely the most important" 

Runs tests: Use & misuse(onesample runs test, WaldWolfowitz test, test of randomness, comparing distributions, trends)Statistics courses, especially for biologists, assume formulae = understanding and teach how to do statistics, but largely ignore what those procedures assume, and how their results mislead when those assumptions are unreasonable. The resulting misuse is, shall we say, predictable... Use and MisuseThe onesample runs test assesses whether a sequence of observations on a dichotomous (or binary) variable can be considered random. The same test can be applied to the twosample situation in which case it is known as the WaldWolfowitz test. It functions as an overall test of difference between two independent samples. In other words, the alternative hypothesis is that the distribution of the groups differ in some way  whether location, dispersion, skew or kurtosis. The runs test and the WaldWolfowitz test are now rarely found in the medical literature, perhaps reflecting the awareness that their use is seldom justified. The tests are, however, still found in the ecological literature, especially for preliminary analysis of spatial and temporal data. Another use is to assess trend in the residuals of nonlinear regression. One misuse of the test results from its lack of power as noted above. We found one example where the runs test was used to assess whether shedding of bacteria by cattle is random or clustered over time. This was bound to be unproductive given a sample size of only 12. Other examples were found where sample sizes were adequate, but the comparison would have had even more power if the KolmogorovSmirnov test had been used. Some authors used the normal approximation even for small samples, or used the test on data with large numbers of ties. Both of these will give misleading results. Exact or Monte Carlo solutions should be used for small samples, and the test should not be used at all on data with large numbers of ties. The other major misuse of the test was to accept a significant result of the WaldWolfowitz test as demonstrating that means (or medians) differ. Unfortunately the test cannot do that  it can only indicate that the distributions differ in some way. We found a well known example of this where the test (wrongly) appeared to show that left handed people do not live as long as right handed people. The moral of the story is that if you wish to show a difference between means or medians, then use a test which will demonstrate this such as the median test, the WilcoxonMannWhitney (if distributions are a similar shape) or a ttest (if distributions approach normal). There is perhaps some justification for using runs test(s) as an initial (global) test to detect trends, with subsequent tests only applied if the initial runs test is significant. We found two ecological examples, one in relation to detecting spatial clustering and the other considering cyclic fluctuations over time. Similarly, the runs test can be used to check for trend in the residuals of nonlinear regression, but it cannot on its own provide a test of goodness of fit. The danger with the runs test for both these applications is that there are some nonrandom patterns that it will fail to identify. What the statisticians saySprent (1993) (1998) covers both the onesample runs test and the twosample WaldWolfowitz test. Zar (1998) covers only the one sample test in Chapter 25, noting that use of the WilcoxonMannWhitney test is preferable to the WaldWolfowitz test in the two sample situation. Sokal & Rohlf (1995) covers the onesample runs test with a good explanation of its varied usage. Gibbons & Chakraborti (1992) give a detailed treatment of runs tests including exact (permutation) tests, and tests based on the length of longest run. Siegel (1956) is an older text but is still useful for nonparametric tests.Mogull (1994) reports that the test is incapable of signaling departures from randomness with run lengths of two. Moore & Wallis (1943) look at runs tests for carrying out significance tests on time series data, whilst Huitema (1996) notes that it has the wrong type I error rate if used to evaluate the independence of errors in timeseries regression models. Mood (1940) reviews much of the underlying theory on the distribution of runs, whilst Wald & Wolfowitz (1940) proposes the two independent sample runs test. Wikipedia describes the main features of the runs test (WaldWolfowitz runs test and runs test are treated as synonyms).
