Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)



Definition and properties

Estimating the confidence interval of a proportion (or count) is a much more controversial operation than doing the same for a mean. This controversy stems from the fact that for many years textbooks have promoted the simple normal approximation binomial interval for all situations other than small samples and very small proportions. Such intervals are easy to understand and to calculate, but make unrealistic assumptions - in particular that the variance is independent of the mean. This interval is now known to be deeply flawed, even when calculated from moderate proportions or surprisingly large samples, and many statisticians say they should not be used under any circumstances.

An alternative to the simple normal approximation interval is an 'exact' interval. An exact interval is best described as one that it derived directly from an appropriate model describing how the statistic in question might vary - in this case the binomial distribution. Such intervals are calculated in a different way from normal approximation intervals, leading to an alternative definition of a confidence interval.

In principle, the confidence interval of a proportion or count may be defined in 2 ways:

  1. That range which, when attached to a sample statistic (p), should enclose the true parametric value on a given proportion (1−α) of occasions.
      The proportion of occasions that an interval is assumed to enclose the parameter is known as the interval's nominal coverage.

  2. That range of parameters (PL, PU) within which you would class your observed proportion as typical. In other words, if you constructed a null model about PL or PU a significance test would not reject p at an α significance level.
      This view of intervals, known as test-inversion or the tail method, underlies 'exact' intervals.

Unfortunately in text books the term exact confidence interval has been interpreted in several, mutually contradictory, ways:

  1. That exact intervals do not use the normal approximation.
  2. That 'exact' refers to the interval width.
  3. That an exact interval has precisely a 1−α coverage, irrespective of P or n.
  4. That an exact interval has at least 1−α coverage, irrespective of P or n.
  5. That an exact interval has, on average a coverage which approaches 1−α.

Whilst the first viewpoint is correct but by no means sufficient, the second is simplistic and incorrect. The third is almost impossible for a discrete statistic such as a proportion - unless P approaches 0.5 and the sample size (n) approaches infinity. The fourth definition, using conventional P-values, is what many statisticians would propose - albeit such a definition tends to yield unduly conservative intervals - and correspondingly biased inference. The last definition, which covers mid-P-intervals is much less biased, but only when considered over a range of parameters. It also leaves open what range of parameters, or what range of coverages, are reasonable. None of these definitions explicitly allow for how the interval ought to be located about the proportion. In fact the correct definition of exact is simply that exact intervals for a proportion use the binomial distribution to estimate probabilities - but with the implicit assumption that the binomial model is the correct model to describe how the sample was gathered.

We also need to clarify the meaning of mid-P intervals. There are both approximate mid-P intervals and exact mid-P intervals. Approximate mid-P intervals are those like the normal approximation and score intervals that do not have a continuity correction. They tend not to be too conservative - but can be much too liberal. Exact mid-P intervals come much closer to having on average a coverage which approaches 1−α - and have been proposed to be the new 'gold standard'.

We start below with the (now discredited) simple normal approximation, not because we advocate its use, but because it provides the basis for a new approximate method - the adjusted Wald interval. This is increasingly being recommended as a replacement for the simple normal interval. We then give the formulae for two score intervals. These take into account the dependence of the variance on the mean, but still assume a normal distribution. Lastly we consider the two main exact intervals - the conventional Clopper-Pearson interval (currently still regarded as the gold standard despite its conservatism) which can be obtained formulaically, and a mid-P exact interval that is readily obtained using R. We provide additional details on properties with each interval.



Normal approximation binomial intervals

Simple normal approximation (Wald interval)

For a large sample (n >100) and a moderate proportion (0.3 < p < 0.7), the traditional approach was to use p and q (in place of P and Q) to estimate the standard error. The mean variance relationship was then ignored, enabling the following simple formulation to be used:

Algebraically speaking -

100(1− α) CI(p)    =    p ± z  
95% CI(p)    =    p ± 1.96  
95% CI(p)    =    p ± 1.96
 f (n − f)
  • α is the significance level; for a 95% confidence interval α = 0.05.
  • z is the (1 − α/2) percentile of the standard normal distribution; for a 95% confidence interval z(1 − α/2) = 1.96.
  • p and q are the proportions in the sample with and without the character of interest,
  • n is the sample size,
  • f is the number of individuals with the character of interest

Some authorities (for example Cochran (1977)) recommend the use of t0.05, df = n − 1 in place of 1.96, and (n − 1) in place of n, in the first equation above on the basis that the true standard error of the proportion is unknown. In practice this is seldom done, because (a) this correction has little effect where n is greater than about 30 - and (b), if n is less than 30, or PQn < 5, use of the normal approximation has never been recommended.

Simple normal intervals are commonly known as 'simple asymptotic' or 'Wald' intervals. Although popular and simple to calculate, they suffer from several important defects.

  1. They are always too liberal. But, if P or n are small, or in unlucky combination, they can be extremely liberal. Textbook recommendations for validity are misleading.
  2. If p is close to 0 or 1, you can get impossible limits with limits less than 0 or greater than 1. Truncating limits to 0 or 1 is also misleading. Again, when p=0, intervals are either zero width or infinite.
  3. Last but not least, it is misleading to attach symmetrical limits to a statistic with a skewed distribution.

Given their unreliability, we agree with Newcombe (1998) that, with or without a continuity correction, simple normal approximation intervals 'should no longer be acceptable for the scientific literature.'


Continuity-corrected Wald interval

If the sample size lies between about 20 and 100, it was usual to apply a continuity correction - by adding a half divided by the sample size to the upper limit, and subtracting a half divided by the sample size to the lower limit. In other words, this correction expands the interval by 1/n. However, some statisticians argue that, although this correction makes no worthwhile difference for large samples, it may cause overcoverage for small samples unless p is close to 0.5. To prevent this, where intervals for proportions are estimated by testing the difference between a parameter and an observed proportion, it is recommended that this correction be omitted if their difference is less than the continuity correction.

Algebraically speaking -

CL    =    p − 1.96   −  
pq 1

CU    =    p + 1.96   +  
pq 1

  • CL and CU are the Lower and Upper 95% Confidence limits
  • p and q are the proportions in the sample with and without the character of interest,
  • n is the sample size.

Whilst these intervals share most of the defects of 'uncorrected' intervals, these additional points should be made:

  1. When p=0 these intervals do not collapse to zero width. But if p is close to zero these intervals are much more likely to overshoot.
  2. Although the continuity correction increases coverage, it does not do so very much.

Adjusted (modified) Wald interval

The adjusted Wald interval was proposed by Agresti & Coull (1998). The estimate of the proportion is first modified to give the Wilson point estimator (pW) thus:

Algebraically speaking -

pW  =    Y + z   ≅    Y + 2
n+z2 n + 4
  • Y is the number of successes,
  • z is the (1 − α/2) percentile of the standard normal distribution where α is the significance level; for a 95% confidence interval z = 1.96 ≅ 2.
  • n is the sample size.

Algebraically speaking -

95% CI(p)    =    pW ± 1.96  
pW (1 − pW)
  • pW is the Wilson point estimator,
  • n is the sample size.

The modified Wald interval has the same point estimator as the Wilson score interval (see below), and its formulation is simply an approximation to that interval. Not surprisingly it has similar properties to the Wilson score interval (and has even been claimed to have improved properties in some respects), and has been recommended by several authorities when n > 40. It is always preferable to the simple Wald interval, and is very simple to calculate. Note , however, it does have some undesirable properties when p is close to zero or 1, and values of the interval must be truncated.


Finite population correction

The formulations above all assume that the sample size is very small compared to the total population size. Sometimes this may not be the case as, for example, when carrying out ecological studies on endangered species or breeds. If the sample comprises amore than 20% of the population, it is necessary to apply the finite population correction. This is given by multiplying the standard error by the correction factor (1 minus the proportion of population sampled).

Algebraically speaking -

95% CI(p)    =    p ± 1.96
pq  (1 − n )
  • p and q are the proportions in the sample with and without the character of interest,
  • n is the sample size and N is the total population size.

This correction is usually applied to the Wald interval, so not surprisingly these intervals share the properties of simple normal intervals. However, there is no reason why the correction should not be applied to the adjusted Wald interval - which would bring a similar improvement in properties.



Score method binomial intervals

These are known as score methods because the central component of the calculating the interval involves carrying out a score test. A score test is a particular type of parametric test that can be formulated in situations where the variability is difficult to estimate. Score methods are appropriate for any proportion providing n is large - or, more precisely, providing PQn is greater than five. They are equivalent to an unequal variance normal approximation test-inversion, without a t-correction. The limits are obtained by a quadratic method, not graphically.

Wilson score interval

Algebraically speaking -

CL   =   a − b     CU   =   a + b
c c

  • CL and CU are the lower and upper confidence limits,
  • a = p +(z2/2n)
    b  =  z
    pq + (z2/4n)
  • c = 1+ z2/n,
  • p and q are the proportions in the sample with and without the character of interest,
  • n is the sample size,
  • z is the 1 − α/2 percentile of the standard normal distribution where α is the significance level; for a 95% confidence interval z = 1.96.

Note: We have had problems with some of the formulations given in the literature for this interval. This one (given by Sauro & Lewis (2005)) at least seems to give the same answer as R and several other 'calculators' on the web!!

Fleiss score interval

This is the same as the Wilson score interval but includes a continuity correction.

Algebraically speaking -

CL   =   d − e     CU   =   f + g
h h

  • CL and CU are the lower and upper 95% confidence limits
  • d = 2np + z2 − 1, e = z√[z2 − (2 + 1/n) + 4p(nq + 1)]
  • f = 2np + z2 + 1, g = z√[z2 − (2 − 1/n) + 4p(nq − 1)]
  • h = 2(n+z2)
  • p and q are the proportions in the sample with and without the character of interest,
  • n is the sample size,
  • z is the (1 − α/2) percentile of the standard normal distribution where α is the significance level; for a 95% confidence interval z = 1.96.

Given their rather complex formulation, these method have been little used in the past - although in recent years they have come into favour as they are much more accurate than the simple normal approximation - providing the sample size is large. Although not immediately obvious from the formula, this improved accuracy is because it allows for the fact that the variance (PQn) is not homogenous.

For large n and non-extreme P the properties of Wilson score intervals approach those of mid-P exact intervals, and Fleiss intervals approach those of Clopper-Pearson intervals (see below). Depending upon whether you prefer an average or a minimum 95% coverage, these score intervals do not collapse or overshoot, are located reasonably, and have good coverage properties. Fleiss intervals seldom have coverage below 95%.



Exact binomial intervals

Exact Clopper-Pearson interval (conventional P)

Exact binomial intervals were originally obtained by inversion of the equal-tail binomial test - as suggested by Clopper& Pearson (1934) using conventional P-values. Until fairly recently, this meant most people had to use tables generated by mathematicians, which gave the confidence intervals for all proportions for small sample sizes. These tables can be found in a number of statistical texts, for example table A4 in Conover (1999).

Alternatively there are formulaic methods derived from a mathematical relationship between the binomial distribution and various continuous distributions including the beta-binomial and the F-distribution. Despite their usefulness, they are still only given in a few statistical textbooks. We give the formulation provided by Bart et al. (1998). Apart from p,q and n, you also have to obtain the value of the appropriate quantile from the F-distribution.

Algebraically speaking -

CL = [1 + F0.025,v1,v2(q + 1/n)/p]−1

    and F has v1 and v2 degrees of freedom: v1 = 2(nq + 1) and v2 = 2np

CU = [1 + q]−1
(1/n + p)F0.025,v3,v4

    and F has v3 and v4 degrees of freedom, v3 = 2(np + 1) and v4 = 2nq
  • F0.975 is the quantile below which 0.975 of that F-distribution lies.
      N.B. If you are using published tables, look up F0.025 - in other words the quantile above which 0.025 of that F-distribution lies.
  • CL and CU are the lower and upper 95% confidence limits
  • p and q are the proportions in the sample with and without the character of interest,
  • n is the sample size,

Exact Clopper Pearson binomial intervals using conventional-P-values have a minimum coverage close to 95% - and very good location. Be aware that, although this provides an exact interval, it overcovers - if variation is purely binomial its coverage is at least 95%, but its maximum coverage can be excessively high. However, exact intervals neither overshoot the 0 to 1 range, and are not liable to produce intervals of zero width (collapse).


Exact mid-P interval

Although many statisticians still consider the Clopper-Pearson interval as providing the 'gold standard' for the binomial proportion interval, many now advocate that the mid-P exact interval should take over this role. However, bear in mind that, because we are dealing with discrete variables, no interval gives precisely the correct (nominal) coverage, however some methods approach that coverage when examined over a range of parameters.

As far as we know, a mid-P exact interval can only be obtained by test inversion - not by any formulaic method. Although virtually any confidence interval can be obtained by test inversion, because it is relatively computer-intensive, this method is generally confined to intervals that cannot be obtained any other way - or for which there is no 'suitable approximation'. Test inversion intervals work under the definition that a confidence interval about an observed statistic encloses a range of parameters which, when tested, would not reject that observed statistic.

Mid-P exact binomial intervals also have good location properties, and provide close to a mean coverage of 95% - although there can be considerable variation where P approaches 0 or 1. Unfortunately, being less conventional, few packages calculate mid-P exact binomial intervals. We explain how test inversion is carried out in the worked example below.



Poisson methods for counts and rates

Normal approximation Poisson interval for count

Normal approximation interval for a count (f) is obtained simply by taking by using its square root as an estimate of the standard error:

Algebraically speaking -

95% CI(f)  =    Y ± 1.96 √f

However this estimate is biased and a better estimate, albeit incorporating a continuity correction, is given by:
CL  =   ( 1.96  −  √f  ) 2
CU  =   ( 1.96  +  √(f + 1)  ) 2

  • CL and CU are the lower and upper 95% confidence limits
  • f is the number of cases,

Normal approximation Poisson intervals share many of the properties of simple normal intervals for a proportion. Notice also, when applied to binomial data, their width is biased by approximately 1/√q - and therefore unnecessarily conservative.


Exact Poisson intervals for counts

Again, until relatively recently, the most frequently used method was to use tables generated by mathematicians which gave exact Poisson confidence intervals for all small counts. However, the exact confidence interval for a count (Y) is readily obtained from the relationship between the chi-square distribution and the Poisson distribution. The appropriate degrees of freedom must be calculated separately for the upper and lower limits (remember we use the same system as R so χ20.025, is the chi square quantile for upper tail probability of 0.025; this is the opposite way to that in which statistical tables are usually done).

Like Clopper-Pearson intervals, this gives a conventional-P interval, and is therefore conservative.

Algebraically speaking -

Lower 95% CI (Y)  =    χ20.025, df=v1
Upper 95% CI (Y)  =    χ20.975, df=v2
  • Y is the observed number of events,
  • v1 and v2 are degrees of freedom; v1= 2Y and v2 = 2(Y + 1).


Poisson intervals for incidence rates

An incidence rate is obtained by dividing the number of events by person-time-at-risk. A 95% CI for the incidence rate in a cohort study can therefore be obtained by treating the numerator as a Poisson variate, working out the upper and lower confidence limits for the number of events, and then dividing each by the person-time-at-risk which is regarded as fixed and measured without error. The same approach is sometimes used in descriptive studies substituting mid-year population size for person-time-at-risk. However, the interval is likely to be unreliable because mid-year population size is certainly not measured without error.

Note that if Poisson intervals are fitted to a proportion (rather than a rate) they are unnecessarily conservative since they are wider by a factor of approximately 1/q than those based on the binomial distribution.




The key assumption here is that outcomes are independent. In a survey this assumption is assumed to be met if a sample is obtained by simple random sampling. In other words members of the sample have been drawn independently with equal probabilities. In an experimental situation, units should be allocated randomly to treatment and observations must be independent. The latter can be difficult to ensure and we return to this point in the next unit

If you have used stratified random sampling to obtain your overall proportion, a normal approximation confidence interval is estimated in a similar way as for a single random sample, except that the standard error is weighted according to the proportion each stratum makes up of the total. If you have used cluster sampling to obtain your overall proportion, the approach is quite different. The standard error is estimated from the variability between sample proportions in the same way as the standard error of a mean is estimated. If the proportions lie outside the range 0.3 to 0.7 an arcsin transformation should be used. These methods are detailed in the More Information page on Sampling Methods in Unit 7.

p or 1−p
whichever is smaller
Use of any of the normal approximation methods assumes that the proportion is distributed normally. This is commonly (but wrongly) taken to be the case if pqn is greater than 5.

Cochran (1977) recommends more conservative criteria and gives the minimum values of n for when to use the normal approximation for proportions varying from 0.05 to 0.95 (see table):

Newcombe (1998) recommends that the simple normal approximation should never be used - and that score or exact binomial limits should be calculated instead.

topics :

Estimating sample size

Limits when p or f are 0

Estimating the confidence interval for true prevalence