InfluentialPoints.com
Biology, images, analysis, design...
 Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

# Kolmogorov-Smirnov test One- & two-sample, and related tests

#### Worked example I

We base our first example on some data on sole horn moisture content from a study by Higuchi & Nagahata (2001) on cows with and without laminitis. We will assume for now that we wish to test this observed distribution against a reference normal distribution of mean 35.0 and standard deviation 2.0.

The observed cumulative relative frequencies (S(Y)) are obtained by dividing rank (r) by the number of observations (n). Expected cumulative relative frequencies (Fo(Y)) for these quantiles from a normal distribution are given by the area under the normal curve; they are readily obtained from R. Differences are obtained between observed and expected values (S(Y)i - F(Y)i), and between observed of the previous variate and expected values (S(Y)i-1 - F(Y)i). The largest absolute difference of these two sets of differences is d.

 Moisture content of sole horn Obs # % moisture(Y) Observed (S(Y)) Expected(Fo(Y)) S(Y)i - F(Y)i S(Y)i-1 - F(Y)i 123456789101112 32.232.333.133.233.334.535.235.336.536.837.037.6 0.08330.16670.25000.33330.41670.50000.58330.66670.75000.83330.91671.0000 0.08080.08850.17120.18410.19770.40130.53980.55960.77340.81590.84130.9032 0.00250.07820.07880.14920.21900.09870.04350.1070-0.02340.01740.07530.0968 -0.0808-0.0052-0.00440.06590.13570.0154-0.03980.0237-0.1067-0.0659-0.00800.0134

The process may be easier to follow on the first graph below. The red curve shows the expected cumulative relative frequencies from a normal distribution. Blue points show the observed cumulative relative frequencies. Red points show points on the cumulative normal curve equivalent to observed cumulative relative frequencies. Green points lie immediately before each step-up.

{Fig. 1}

Using

A more efficient approach is shown in the second figure above. A correction factor (0.5/n) is subtracted from each observed cumulative relative frequency. Only one difference then has to be calculated for each observed cumulative relative frequency. Once the largest absolute difference is identified, the correction factor is added back on again to give d.

We said at the start of this worked example that we were testing the observed distribution against a fully defined normal distribution (μ=35,σ=2). Usually, however, one is more interested in an omnibus test of normality - using the sample mean and standard deviation as estimates of the population parameters.

The Kolmogorov-Smirnov test should not be used to test such a hypothesis - but we will do it here in R in order to see why it is inappropriate. In this example the mean is 34.754 and the standard deviation is 1.92472.

Using

The P-value we obtain is 0.7026 - which gives no indication of a significant deviation from normality. Let us now use three of specialized tests of normality which allow for the fact that one is estimating parameters from the sample.

• Lilliefors test of normality

Using
The maximum difference (D) is estimated in exactly the same way as previously.

The test statistic is the same (D = 0.191), but the P-value is much lower at 0.2628.

• Cramér-von Mises's test of normality

Using
This test uses a different test statistic from Kolmogorov-Smirnov and Lilliefors.

The test statistic (W) is 0.0591, with a P-value of 0.3606.

• Anderson-Darling test of normality

Using
This test uses another different test statistic which gives more weight to the tails of the distribution. It is reputedly the most powerful of this family of tests.

The test statistic (A) is 0.3891, with a P-value of 0.3263.

#### Conclusions

Whilst the P-value from the Kolmogorov-Smirnov test (0.7026) is not valid for the reasons stated, any of the other three tests could justifiably be used depending on which aspect of the distribution one is most interested in. None of them indicates a significant deviation from normality - although with such a small sample the deviation would have to be very marked to be detected.

Postscript: When there are several (appropriate) tests to choose from, it is very important to select the test a priori, and not just choose the one that gives the desired result. If you want to give more weight to the tails of the distribution, then select the Anderson-Darling test. If you want to give more weight to the centre of the distribution, then select the Lilliefors test.

#### Worked example II

We use the same data on the effect of drug treatment on the length of time from treatment to lambing that we have used previously. An equal-variance t-test on the log transformed data gave a P-value of 0.00986, whilst an unequal-variance t-test on the raw data gave a non-significant P-value of 0.0823. A Wald-Wolfowitz test also gave a non-significant P-value.

 Time (hours) fromtreatment to lambing Untreated (U) Treated (T) 458712312070 517142375178 5149564758 = 89.0 = 53.7

The observed cumulative relative frequencies (S1(Y) and S2(Y)) are obtained by dividing rank (r) by the number of observations (n). Differences are then obtained between the two sets of observed values (S1(Y)i - S2(Y)i). The largest absolute difference of these two sets of differences is d.

{Fig. 2}

The maximum difference here is between the smallest of the five values in sample 1 (45, S = 1/5 = 0.2) and the ninth ranked value in sample 2 (58, S = 9/11 = 0.8182). Hence d = 0.6182. This is also the value given by R.

Using

However, we do have a problem with the presence of ties - three observations all have the same reading (51). R makes it clear that it cannot compute correct p-values when ties are present (although it does give a P-value anyway of 0.145).

Since in this case observations were rounded to the nearest hour, one possible way round this problem would be to jitter observations with the same readings - giving them randomly chosen values between 50.6 and 51.4. In this particular example, jittering does not affect the value of the test statistic, and enables R to give a (defensible) P-value of 0.125.

In other words, much like the unequal variance t-test and the Wald-Wolfowitz test, it suggests there is no significant difference in time to lambing between treated and untreated sheep. This reflects the lack of power of the Kolmogorov-Smirnov test to detect differences in distributions between two small samples.