Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
Runs tests One-sample runs test & the Wald-Wolfowitz testOn this page: One-sample runs test Two-sample Wald-Wolfowitz test Assumptions
One-sample runs test
The one-sample runs test assesses whether a sequence of observations on a dichotomous variable can be considered random. It is based on the number of runs occurring within the sample. For this test a run is defined as a sequence of adjacent equal observations. More precisely it is a succession of identical observations which are followed and preceded by different observations or by no observation at
For large samples the difference between the observed and expected number of runs divided by its standard error has a standard normal distribution. Computations are carried out as follows:
For small samples an exact permutation test can be carried out based upon the distribution of all possible numbers of runs in a sequence of a fixed number of binary digits. This exact test is provided in some packages (for example StatXact). Alternatively the number of runs is compared with that given in the appropriate tables (see Table F in Siegel
Uses of one-sample runs testFor a dichotomous variable
Two-sample Wald-Wolfowitz test
The same test can be applied to the two-sample situation in which case it is known as the Wald-Wolfowitz test. It functions as an overall test of difference between two independent samples. In other words, the alternative hypothesis is that the distribution of the groups differ in some way - whether location, dispersion, skew or kurtosis.
Data from the two samples are combined into a single sample, and arranged into ascending rank order. Data are then coded as 1 for sample one and 0 for sample 2. The test proceeds as in the one sample
Ideally there should be no ties in the data used for the Wald-Wolfowitz test. In practice there is no problem with ties within a group, but if ties occur between members of the different groups then there is no unique sequence of observations. For example the data sets A: 10,14,17,19,34 and B: 12,13,17,19,22 can give four possible sequences, with two possible values for r (7 or 9). The solution to this is to list every possible combination, and calculate the test statistic for each one. If all test statistics are significant at the chosen level, then one can reject the null hypothesis. If only some are significant, then Siegel
Since it is a test of randomness, for once we do not have to assume random sampling to use this test. Nor are there any distributional assumptions. But the following assumptions are made:
Various weaknesses and drawbacks of the runs test have been identified:
This test is conditional upon the observed frequency of each type of observation - it makes no predictions regarding what might happen if other frequencies were observable. It also ignores the possibility of any sort of classification error - random or systematic. In order to test the null hypothesis all observations are assumed to be equally likely to appear in any order. If the nature of your observations makes this outcome unlikely that confounding factor will bias this test's
Notice also this test does not distinguish where runs occur in a sequence - nor the length of those runs - nor how much the run-lengths vary. Last but not least, where data are collapsed to a binary classification, this test makes no allowance for how the breakpoint was decided upon - nor how far the original values deviate from that breakpoint.