InfluentialPoints.com Biology, images, analysis, design... 

"It has long been an axiom of mine that the little things are infinitely the most important" 

KolmogorovSmirnov and related tests: Use & misuse(one and two sample tests, normality, estimated parameters, Lilliefors test, discrete distributions)Statistics courses, especially for biologists, assume formulae = understanding and teach how to do statistics, but largely ignore what those procedures assume, and how their results mislead when those assumptions are unreasonable. The resulting misuse is, shall we say, predictable... Use and MisuseThese tests provide a means of comparing distributions, whether two sample distributions or a sample distribution with a theoretical distribution. The distributions are compared in their cumulative form as empirical distribution functions. The test statistic developed by Kolmogorov and Smirnov to compare distributions was simply the maximum vertical distance between the two functions. KolmogorovSmirnov tests have the advantages that (a) the distribution of statistic does not depend on cumulative distribution function being tested and (b) the test is exact. They have the disadvantage that they are more sensitive to deviations near the centre of the distribution than at the tails. Both the one and twosample KolmogorovSmirnov and related tests are widely used in all disciplines. Unfortunately, the onesample KolmogorovSmirnov test is commonly misused to test normality when the parameters of the normal distribution are estimated from the sample rather than specified a priori. The result is that the test is far too conservative, and distributions that are clearly not normal are wrongly classified as such. This practice is perhaps reinforced by a sometimes unconcealed desire to demonstrate normality so that subsequent parametric tests can be carried out. The situation is not helped by various software packages being unclear about which test is being used. The correct test to use to test for normality when the parameters of the normal distribution are estimated from the sample is Lilliefors test. When it comes to goodness of fit to discrete distributions, the test can be adapted to give the correct Pvalue, and various packages provide software to test goodness of fit to the Poisson distribution and the Zipf distribution. However, there is no Lilliefors equivalent for these distributions, so again parameters cannot be estimated from the sample. A second major problem arises from testing discrete variables against continuous distribution functions. We give a well known example where a KolmogorovSmirnov test of final digits of Pvalues (a discrete variable) suggested that they deviated from the expected (continuous) uniform distribution. The test, however, gave the wrong Pvalue because with many ties, the test is far too liberal. A more basic error that we find with all goodness of fit tests is misinterpretation of a small Pvalue to indicate a 'good fit'. In fact of course it means the opposite, but researchers are so imbued with the need for significance that they forget that, with goodness of fit tests, a significant result means a deviation from the 'null' distribution. With the two sample test, the question usually is  what is it one wants to compare? A KolmogorovSmirnov test compares the overall distributions rather than specifically locations or dispersions. By and large we have found the test is used correctly in this respect. But there is the same problem as with the onesample test over the interpretation of nonsignificant Pvalues. In some cases authors seem to think that they have proved the null hypothesis, and that two distributions are therefore 'the same'. This may appear rather pedantic, but it is important. The KolmogorovSmirnov test has rather little power against the null hypothesis when comparing distributions, and for small sample sizes, the two distributions would need to be completely different for this test to show a significant difference. What the statisticians sayConover (1999) devotes a full chapter to statistics of the KolmogorovSmirnov type with full details on estimation of confidence intervals  but recent developments on improving power of the test are not covered. Sokal & Rohlf (1995) gives an uptodate account of the KolmogorovSmirnov tests including the recent twostage dadjustment. Sprent (1998) covers both the one and twosample tests in Chapter 6. Siegel (1956) introduces the KolmogorovSmirnov tests, but does not of course consider the (later) tests by Lilliefors and AndersonDarling.Khamis et al. (2000) (1992) propose a modification of the test which improves its power for small to moderate size samples. Harter et al. (1984) show that you can allow for differences between observed and expected frequencies before and after each step of the cumulative distribution by subtracting 0.5 from each observed frequency. Lilliefors (1967) showed that the KolmogorovSmirnov onesample test is too conservative if expected frequencies are calculated using parameters estimated from the sample  commonly tabulated (and software) values are only valid for a fully defined distribution. Anderson & Darling (1952) proposed the AndersonDarling test and Stephens (1974) modified it for use when the distribution is not completely specified. Wikipedia provides information on the KolmogorovSmirnov test, Lilliefors test, , the AndersonDarling test and Cramérvon Mises's test.
