Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
Confidence intervals of proportions and ratesOn this page: Definition & properties Simple normal approximation to binomial, Wald, interval Continuity-corrected Wald interval, Adjusted, modified, Wald interval Finite population correction Score method binomial interval Fleiss score interval Exact binomial, Clopper-Pearson Exact mid-P interval Poisson methods for counts & rates Exact Poisson interval for counts Poisson interval for incidence rates Assumptions
Definition and properties
Estimating the confidence interval of a proportion (or count) is a much more controversial operation than doing the same for a mean. This controversy stems from the fact that for many years textbooks have promoted the simple normal approximation binomial interval for all situations other than small samples and very small proportions. Such intervals are easy to understand and to calculate, but make unrealistic assumptions - in particular that the variance is independent of the mean. This interval is now known to be deeply flawed, even when calculated from moderate proportions or surprisingly large samples, and many statisticians say they should not be used under any circumstances.
An alternative to the simple normal approximation interval is an 'exact' interval. An exact interval is best described as one that it derived directly from an appropriate model describing how the statistic in question might vary - in this case the binomial distribution. Such intervals are calculated in a different way from normal approximation intervals, leading to an alternative definition of a confidence interval.
In principle, the confidence interval of a proportion or count may be defined in 2 ways:
Unfortunately in text books the term exact confidence interval has been interpreted in several, mutually contradictory, ways:
Whilst the first viewpoint is correct but by no means sufficient, the second is simplistic and incorrect. The third is almost impossible for a discrete statistic such as a proportion - unless P approaches 0.5 and the sample size (n) approaches infinity. The fourth definition, using conventional P-values, is what many statisticians would propose - albeit such a definition tends to yield unduly conservative intervals - and correspondingly biased inference. The last definition, which covers mid-P-intervals is much less biased, but only when considered over a range of parameters. It also leaves open what range of parameters, or what range of coverages, are reasonable. None of these definitions explicitly allow for how the interval ought to be located about the proportion. In fact the correct definition of exact is simply that exact intervals for a proportion use the binomial distribution to estimate probabilities - but with the implicit assumption that the binomial model is the correct model to describe how the sample was gathered.
We also need to clarify the meaning of mid-P intervals. There are both approximate mid-P intervals and exact mid-P intervals. Approximate mid-P intervals are those like the normal approximation and score intervals that do not have a continuity correction. They tend not to be too conservative - but can be much too liberal. Exact mid-P intervals come much closer to having on average a coverage which approaches 1−α - and have been proposed to be the new 'gold standard'.
We start below with the (now discredited) simple normal approximation, not because we advocate its use, but because it provides the basis for a new approximate method - the adjusted Wald interval. This is increasingly being recommended as a replacement for the simple normal interval. We then give the formulae for two score intervals. These take into account the dependence of the variance on the mean, but still assume a normal distribution. Lastly we consider the two main exact intervals - the conventional Clopper-Pearson interval (currently still regarded as the gold standard despite its conservatism) which can be obtained formulaically, and a mid-P exact interval that is readily obtained using R. We provide additional details on properties with each interval.
Normal approximation binomial intervals
Simple normal approximation (Wald interval)
For a large sample (n >100) and a moderate proportion (0.3 < p < 0.7), the traditional approach was to use p and q (in place of P and Q) to estimate the standard error. The mean variance relationship was then ignored, enabling the following simple formulation to be used:
Some authorities (for example Cochran
Simple normal intervals are commonly known as 'simple asymptotic' or 'Wald' intervals. Although popular and simple to calculate, they suffer from several important defects.
Continuity-corrected Wald interval
If the sample size lies between about 20 and 100, it was usual to apply a continuity correction - by adding a half divided by the sample size to the upper limit, and subtracting a half divided by the sample size to the lower limit. In other words, this correction expands the interval by 1/n. However, some statisticians argue that, although this correction makes no worthwhile difference for large samples, it may cause overcoverage for small samples unless p is close to 0.5. To prevent this, where intervals for proportions are estimated by testing the difference between a parameter and an observed proportion, it is recommended that this correction be omitted if their difference is less than the continuity correction.
Adjusted (modified) Wald interval
The modified Wald interval has the same point estimator as the Wilson score interval (see below), and its formulation is simply an approximation to that interval. Not surprisingly it has similar properties to the Wilson score interval (and has even been claimed to have improved properties in some respects), and has been recommended by several authorities when n > 40. It is always preferable to the simple Wald interval, and is very simple to calculate. Note , however, it does have some undesirable properties when p is close to zero or 1, and values of the interval must be truncated.
Finite population correction
The formulations above all assume that the sample size is very small compared to the total population size. Sometimes this may not be the case as, for example, when carrying out ecological studies on endangered species or breeds. If the sample comprises amore than 20% of the population, it is necessary to apply the finite population correction. This is given by multiplying the standard error by the correction factor (1 minus the proportion of population sampled).
This correction is usually applied to the Wald interval, so not surprisingly these intervals share the properties of simple normal
Score method binomial intervals
These are known as score methods because the central component of the calculating the interval involves carrying out a score test. A score test is a particular type of parametric test that can be formulated in situations where the variability is difficult to estimate. Score methods are appropriate for any proportion providing n is large - or, more precisely, providing PQn is greater than five. They are equivalent to an unequal variance normal approximation test-inversion, without a t-correction. The limits are obtained by a quadratic method, not graphically.
Wilson score interval
Fleiss score interval
This is the same as the Wilson score interval but includes a continuity correction.
Given their rather complex formulation, these method have been little used in the past - although in recent years they have come into favour as they are much more accurate than the simple normal approximation - providing the sample size is large. Although not immediately obvious from the formula, this improved accuracy is because it allows for the fact that the variance (PQn) is not homogenous.
For large n and non-extreme P the properties of Wilson score intervals approach those of mid-P exact intervals, and Fleiss intervals approach those of Clopper-Pearson intervals (see below). Depending upon whether you prefer an average or a minimum 95% coverage, these score intervals do not collapse or overshoot, are located reasonably, and have good coverage properties. Fleiss intervals seldom have coverage below 95%.
Exact binomial intervals
Exact Clopper-Pearson interval (conventional P)
Exact binomial intervals were originally obtained by inversion of the equal-tail binomial test - as suggested by Clopper& Pearson (1934) using conventional P-values. Until fairly recently, this meant most people had to use tables generated by mathematicians, which gave the confidence intervals for all proportions for small sample sizes. These tables can be found in a number of statistical texts, for example table A4 in Conover
Alternatively there are formulaic methods derived from a mathematical relationship between the binomial distribution and various continuous distributions including the beta-binomial and the F-distribution. Despite their usefulness, they are still only given in a few statistical textbooks. We give the formulation provided by Bart et al.
Exact Clopper Pearson binomial intervals using conventional-P-values have a minimum coverage close to 95% - and very good location. Be aware that, although this provides an exact interval, it overcovers - if variation is purely binomial its coverage is at least 95%, but its maximum coverage can be excessively high. However, exact intervals neither overshoot the 0 to 1 range, and are not liable to produce intervals of zero width (collapse).
Exact mid-P interval
Although many statisticians still consider the Clopper-Pearson interval as providing the 'gold standard' for the binomial proportion interval, many now advocate that the mid-P exact interval should take over this role. However, bear in mind that, because we are dealing with discrete variables, no interval gives precisely the correct (nominal)
As far as we know, a mid-P exact interval can only be obtained by test inversion - not by any formulaic method. Although virtually any confidence interval can be obtained by test inversion, because it is relatively computer-intensive, this method is generally confined to intervals that cannot be obtained any other way - or for which there is no 'suitable approximation'. Test inversion intervals work under the definition that a confidence interval about an observed statistic encloses a range of parameters which, when tested, would not reject that observed statistic.
Mid-P exact binomial intervals also have good location properties, and provide close to a mean coverage of 95% - although there can be considerable variation where P approaches 0 or 1. Unfortunately, being less conventional, few packages calculate mid-P exact binomial intervals. We explain how test inversion is carried out in the worked example below.
Poisson methods for counts and rates
Normal approximation Poisson interval for count
Normal approximation interval for a count (f) is obtained simply by taking by using its square root as an estimate of the standard error:
Normal approximation Poisson intervals share many of the properties of simple normal intervals for a
Exact Poisson intervals for counts
Again, until relatively recently, the most frequently used method was to use tables generated by mathematicians which gave exact Poisson confidence intervals for all small counts. However, the exact confidence interval for a count (Y) is readily obtained from the relationship between the chi-square distribution and the Poisson distribution. The appropriate degrees of freedom must be calculated separately for the upper and lower limits (remember we use the same system as R so χ20.025, is the chi square quantile for upper tail probability of 0.025; this is the opposite way to that in which statistical tables are usually done).
Like Clopper-Pearson intervals, this gives a conventional-P interval, and is therefore conservative.
Poisson intervals for incidence rates
An incidence rate is obtained by dividing the number of events by person-time-at-risk. A 95% CI for the incidence rate in a cohort study can therefore be obtained by treating the numerator as a Poisson variate, working out the upper and lower confidence limits for the number of events, and then dividing each by the person-time-at-risk which is regarded as fixed and measured without error. The same approach is sometimes used in descriptive studies substituting mid-year population size for person-time-at-risk. However, the interval is likely to be unreliable because mid-year population size is certainly not measured without error.
Note that if Poisson intervals are fitted to a proportion (rather than a rate) they are unnecessarily conservative since they are wider by a factor of approximately 1/q than those based on the binomial distribution.
The key assumption here is that outcomes are independent. In a survey this assumption is assumed to be met if a sample is obtained by simple random sampling. In other words members of the sample have been drawn independently with equal probabilities. In an experimental situation, units should be allocated randomly to treatment and observations must be independent. The latter can be difficult to ensure and we return to this point in the next
If you have used stratified random sampling to obtain your overall proportion, a normal approximation confidence interval is estimated in a similar way as for a single random sample, except that the standard error is weighted according to the proportion each stratum makes up of the total. If you have used cluster sampling to obtain your overall proportion, the approach is quite different. The standard error is estimated from the variability between sample proportions in the same way as the standard error of a mean is estimated. If the proportions lie outside the range 0.3 to 0.7 an arcsin transformation should be used. These methods are detailed in the More Information page on Sampling Methods in