Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
P-values of tests
When the outcome of statistical tests are reported, they are commonly, but not uniformly accompanied by a P-value. For example as, "Flagwaggit's test was significant, P < 0.01" or "the odds-ratio was not significant, P = 0.51" or "chi-square was not significant". In the latter case, by current convention, the P-value is assumed to have been greater than 0.05.
At first sight these statements would seem to have nothing to do with the Pth quantiles of samples, or population quantiles, discussed in the quantiles More Information
When calculated for a sample, the value of a quantile is either descriptive - or, if the sample was taken at random form some larger 'population' of values, a sample quantile (such as the median, or the upper quartile) provides an estimate of the corresponding quantile from that population (such as the median, or the upper quartile).
The essential difference between the P-values of a sample and the P-values that accompany statistical tests is that the latter refer to the estimated distribution of the statistic under test. Thus a "chi-square test" would be comparing the observed value of the statistic under test to a "chi square" distribution, and Flagwaggit's test compares Flagwaggit's statistic to its distribution - and (nearly always) both of those distributions will be estimated from the data at hand, in order to test some hypothesis.
In other words, the P-value of a test is an estimate of the corresponding P-value of whatever population of values the statistic under test was assumed to represent - assuming the hypothesis under test was correct, or approximately so. Specifically, these P-values tell you what proportion of results would yield statistics whose quantiles are as extreme, or more extreme, than the observed value of that test's statistic. For a conventional 1-tailed test, the 'critical value' (P = 0.05) is usually the 95% quantile. A 2-tailed test has 2 critical values, the 2.5% and 97.5% quantiles, which enclose 95% of the statistic's distribution.
If this sounds like gibberish do not worry for now. We explore the detailed reasoning, assumptions, properties and problems of statistical tests in
Therefore, whilst thinking about how to calculate quantiles may seem wholly academic and without merit, not understanding their properties can have serious and very practical consequences. Moreover, because the problem and its solutions are controversial, many statistics textbooks prefer to ignore them. We do not.
One approach, instead of applying conventional P-values, is to use what are known as mid-P-values. To understand the difference between these two quantiles let us consider a value t, which is a member of a collection of such values, called T.
Now imagine that T contains many values (say a million). If every value is different, the proportion that exactly equal t cannot be more than 1/1000000. In that case, if 5% of T are more extreme than t, then R/N=0.05 and R/N - 0.5/N = 0.0499995. However, as we shall see in later units, a million values is an unusually tiny population for a statistic being tested. So when T is infinitely large, as for instance is the chi-squared distribution, to all practical purposes conventional and mid-P-values are identical.
If however, T has a strongly discrete distribution, the proportion of T equal to t may not be negligible. In that situation, among those who are strict about that 5% boundary between significance and nonsignificance, the difference between conventional and mid-P-values can be noticeable. Since the 5% criterion is a legal requirement in some fields of medicine, and lawyers get rich from such discrepancies, this is no small matter.