Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

Search this site



Runs tests One-sample runs test & the Wald-Wolfowitz test

On this page: One-sample runs test  Two-sample Wald-Wolfowitz test  Assumptions 

One-sample runs test

The one-sample runs test assesses whether a sequence of observations on a dichotomous variable can be considered random. It is based on the number of runs occurring within the sample. For this test a run is defined as a sequence of adjacent equal observations. More precisely it is a succession of identical observations which are followed and preceded by different observations or by no observation at all. The test consists of counting the number of runs (r) and comparing the result to the expected value under the null hypothesis - namely that of independence.

For large samples the difference between the observed and expected number of runs divided by its standard error has a standard normal distribution. Computations are carried out as follows:

Algebraically speaking -

z = 
  r -  ( 2nm + 1)
N2 (N − 1)
  • z is the z-statistic which is compared to the standard normal deviate;
  • r is the number of runs in your series;
  • n and m are the frequencies of your two classes of observations;
  • N is the total number of observations (= n + m)

For small samples an exact permutation test can be carried out based upon the distribution of all possible numbers of runs in a sequence of a fixed number of binary digits. This exact test is provided in some packages (for example StatXact). Alternatively the number of runs is compared with that given in the appropriate tables (see Table F in Siegel (1956) or various sources on the web)

Uses of one-sample runs test

For a dichotomous variable
  1. To assess whether the order of given sequence of observations on a dichotomous variable is random or not. The classical example is the order of men and women in a queue - but the test is equally applicable to the order of emergence of male and female insects or the order in which a farmer brings infected and uninfected cows to be checked for disease.
For a measurement variable
  1. To assess whether a given sequence of observations on a measurement variable is random or not. One approach to this is to determine the median of a sequence of observations and then label all items as + if above the median and − if below the median. This is sometimes known as the runs above and below the median test.
  2. To assess goodness of fit of data to a (mathematical) model. Differences between observed and expected values (residuals) are given a + if the observed value exceeds the predicted value and a − if it is less than the predicted value. A greater than expected number of runs indicates serial correlation between residuals. This is commonly used as one of the tests of goodness of fit for nonlinear regression.
  3. To assess whether a sequence of observations on a measurement or ordinal variable showed a trend over time. Increases from one period to another are labelled as + whilst decreases are labelled as labelled as −. A consistent increase over time would give all +s - hence too few runs indicate a trend. Too many runs would indicate short period cyclical fluctuations.



Two-sample Wald-Wolfowitz test

The same test can be applied to the two-sample situation in which case it is known as the Wald-Wolfowitz test. It functions as an overall test of difference between two independent samples. In other words, the alternative hypothesis is that the distribution of the groups differ in some way - whether location, dispersion, skew or kurtosis.

Data from the two samples are combined into a single sample, and arranged into ascending rank order. Data are then coded as 1 for sample one and 0 for sample 2. The test proceeds as in the one sample case. The number of runs (r) is counted and the result compared to the expected value under the null hypothesis - namely that of independence

Ideally there should be no ties in the data used for the Wald-Wolfowitz test. In practice there is no problem with ties within a group, but if ties occur between members of the different groups then there is no unique sequence of observations. For example the data sets A: 10,14,17,19,34 and B: 12,13,17,19,22 can give four possible sequences, with two possible values for r (7 or 9). The solution to this is to list every possible combination, and calculate the test statistic for each one. If all test statistics are significant at the chosen level, then one can reject the null hypothesis. If only some are significant, then Siegel (1956) suggests that the average of the P-values is taken. In general though, the test should not be used if there are more than one or two ties.




Since it is a test of randomness, for once we do not have to assume random sampling to use this test. Nor are there any distributional assumptions. But the following assumptions are made:

  • For use of the normal approximation, both m and n should be greater than 10,
  • For the one-sample test, the variable under study must be dichotomous, or can be collapsed so as to be dichotomous
  • For the two-sample test, the two samples should be mutually independent, and there should be few or no ties. In practice, this only applies to ties between samples, not within samples.

Various weaknesses and drawbacks of the runs test have been identified:

  • In its usual form, the test is not sensitive to departures from randomness for run lengths of two.
  • The runs test has the wrong type I error rate if used to evaluate the independence of errors in time-series regression models.
  • The two-sample test is not robust to slight differences between data sets such as might arise through rounding errors.

This test is conditional upon the observed frequency of each type of observation - it makes no predictions regarding what might happen if other frequencies were observable. It also ignores the possibility of any sort of classification error - random or systematic. In order to test the null hypothesis all observations are assumed to be equally likely to appear in any order. If the nature of your observations makes this outcome unlikely that confounding factor will bias this test's inference.

Notice also this test does not distinguish where runs occur in a sequence - nor the length of those runs - nor how much the run-lengths vary. Last but not least, where data are collapsed to a binary classification, this test makes no allowance for how the breakpoint was decided upon - nor how far the original values deviate from that breakpoint.