Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
Confidence interval of a proportion: Use & misuse
(normal approximation interval, Wilson score, Clopper-Pearson, mid-P, assumptions, independence)
Statistics courses, especially for biologists, assume formulae = understanding and teach how to do statistics, but largely ignore what those procedures assume, and how their results mislead when those assumptions are unreasonable. The resulting misuse is, shall we say, predictable...
Use and Misuse
Confidence intervals are attached to proportions in all disciplines, whether it is the prevalence of a disease (= proportion of people infected) or the proportion of individuals cured with a given treatment. Hence there is a rich and varied literature on ways to calculate the interval. In recent years, recommendations by statisticians on the best method to use have changed radically. Up till the late 1990s, the simple normal approximation Wald interval reigned supreme for all but small samples and extreme proportions. Now many statisticians say the normal approximation interval for a proportion should never be used, and should be replaced by either the adjused Wald interval, the Wilson score interval, the exact Clopper-Pearson interval or the exact mid-P interval.
Of course, which interval is actually used in practice has changed much more gradually given the prevalence of old textbooks (and even older lecturers), so the simple normal approximation interval is still widely (mis)used. Admittedly the biggest problems arise when the traditional conditions for it are not met (in other words pqn < 5). There one can get upper and lower limits with nonsensical values such as less than zero or greater than one. This is especially common in medical research on diagnostic tests, and in wildlife research for studies of resource utilization in relation to availability. One problem with exact intervals is that different authors have different ideas about what is exact and what is not exact. It is therefore essential to specify precisely how such intervals were estimated.
The other commonest misuse is to calculate an interval when the basic assumption for use of a binomial interval - namely that outcomes are independent - is not met. This assumption will not be met if samples are selected using convenience or haphazard sampling. At best such intervals are only a rough measure of the reliability of the proportions - at worst they are meaningless. One should also be cautious about interpreting confidence intervals when there is evidence of bias - such as selection bias or measurement error in the case of questionnaire data - since they give a false sense of confidence in the data. The treatment of cluster samples has improved in recent years, although it is still not uncommon to find intervals fitted which incorrectly assume simple random sampling. This applies whether sampling herds of cows of litters of raccoons, and usually results in the width of the interval being underestimated. The same point applies when pooling replicate experiments - or (worse) experiments where effectively there is only one replicate.
What the statisticians sayArmitage & Berry (2002) give the formula for the Clopper-Pearson exact interval in Chapter 4 along with a useful discussion of the problems of confidence intervals for discrete variables, and the advantages of mid-P values. Agresti (2002) covers statistical inference for binomial parameters in Chapter 1. Rothman & Greenland (1998) and Conover (1998) both provide more in-depth coverage of the topic, the former also covering likelihood intervals and mid-P values. Collett (1991) emphasises the exact interval whilst Fleiss (1981) advocates the continuity-corrected score interval. Bart (1998) provides a good account of the utility of confidence intervals for ecologists in Chapter 3, along with formulae for the Clopper-Pearson exact and simple normal approximation methods. Zar (1996) provides a brief account of the Clopper-Pearson exact method in Chapter 24, as well as the finite population correction, and methods to estimate sample size for a given confidence interval.
García-Pérez (2005) brings improved binomial confidence intervals to the attention of psychologists. Agresti & Gottard (2005) and Agresti (2001) advocate mid-P rather than conventional-P intervals. Reiczigel (2003) refutes criticism of exact intervals by Agresti and others, proposing instead use of Sterne's exact interval. Brown et al. (2001), Agresti & Caffo (2000) and Agresti & Coull (1998) all advocate use of the Wilson score interval for smaller n and adjusted Wald intervals for larger n. Newcombe (1998) and Vollset (1998) compare several methods for obtaining the confidence intervals of a proportion - and also strongly discourage use of the simple Wald interval.Harper & Reeves (1999) unwisely recommend simple Wald intervals to be attached to estimates of specificity and sensitivity. Daly (1998) describes the method we have used to obtain an interval for the incidence rate as the 'substitution method'. Berry & Armitage (1995) review the use of mid-P confidence intervals. Clopper & Pearson (1934) present the conventional-P exact interval for a binomial proportion.
Cai & Krishnamoorthy (2004), Blaker (2000) and Kabaila & Byrne (2000) all propose improved exact intervals for estimating parameters of various discrete distributions including the Poisson. Newman (1995) and Eypasch (1995) remind us of the work of Hanley & Lippman-Hand (1983) on the probability of adverse events that have not yet occurred. Cohen & Yang (1994) look at mid-P confidence intervals for the Poisson parameter. Garwood (1936) uses the Clopper-Pearson method to obtain an exact Poisson interval for the mean event rate.
Wikipedia provides a sections on the binomial proportion confidence interval and on coverage probability. Boomsma recommends the Wilson score or adjusted Wald intervals, whilst Xiaomin He considers confidence intervals for the binomial proportion with zero frequency. There are numerous binomial confidence interval calculators available on the web. Probably the best is from Emory University since it includes the exact mid-P interval, whilst that from Jeff Sauro includes the Wald, adjusted Wald, conventional exact and Wilson score intervals.