Normal approximation binomial intervals
Simple normal approximation (Wald interval)
Continuity-corrected Wald interval
Adjusted (modified) Wald interval
Wilson score binomial interval
Exact binomial intervals
Exact Clopper-Pearson (conventional P)
We will test the exact methods by looking at the result if only 5 seedlings survived out of a total of 25. Hence p = 0.20. The simple normal approximation would be wholly inappropriate (pqn < 5) and some of the other methods may have problems. We will estimate conventional exact and mid-P exact intervals, and compare these in R with those estimated by the other methods.
If we use the tables given by Conover (1999) we get 95% confidence limits of 0.068 to 0.407.
If we use the formulaic method we get:
||[1 + 3.247×(0.8+1/25)/.2]−1 = .068
|CU = [1 +
|(1/25 + .2) × 2.2882
Hence the range 0.068 to 0.407 should enclose the true proportion of seedlings surviving on at least 95% of occasions, assuming that range is calculated in a similar fashion from randomly selected samples of the same population. These are the same as the confidence interval determined from the table values, and that given by R (specifically we used the epitools library binom.exact function, which does it by test inversion).
Mid-P exact by test inversion
Once again we will obtain limits for an observed proportion (p) of 0.2, where only 5 seedlings survived out of a total of (n=) 25. Unlike conventional exact binomial intervals, exact mid-P intervals can only be obtained by test-inversion.
Test inversion intervals work under the definition that a confidence interval about an observed statistic encloses a range of parameters which, when tested, would not reject that observed statistic.
For example, if the confidence interval of an observed odds-ratio does not enclose (the test parameter) zero, that odds-ratio must be significantly different from (that parameter) zero.
Although virtually any confidence interval can be obtained by test inversion, because it is relatively inconvenient this method is generally confined to intervals that cannot be obtained any other way - or for which there is no 'suitable approximation'.
In this case, the simplest solution is to perform exact 1-tailed tests of the observed proportion (p) for each of a predetermined series of test parameters (P) - then compare each value of P to the resulting mid-P-value. Since this method is unlikely to yield mid-P-values of exactly (α/2=) 0.025 or (1− α/2=) 0.975, the upper and lower limits (of P) are usually estimated by interpolation (either graphically, or arithmetically).
The P-value plots below show the exact mid-P-values obtained by tests of our observed p (=0.2). Our top graph shows the result of 1000 1-sided tests of p, each against the null hypothesis that p arose from a random sample of n values from a population of ones and zeroes - of which a set proportion of values (P) values equal 1. So p was compared to each of those, 1000 binomial, null populations (where P=0, 0.0005, 0.0015 ... 0.9995, P=1). For each test the mid-P-value is the proportion of that binomial population that is less than p, plus 1/2 of the proportion which equals p. Thus, when P<<p nearly all the null population is <<p, and the test's P-value approaches 1.
|Our lower graphs show the lower and upper 95% confidence limits (CL & CU) estimated by simple linear interpolation. Each point corresponds to one test result. Since we did these interpolations arithmetically, rather than graphically, these three P-value plots are only provided to illustrate the principle. Indeed, a more 'efficient' method would be to find them by successive approximation - at the expense of finding an efficient 'search' algorithm, and some more-complicated programming.
| || |
That point aside, as we have noted elsewhere, P-value plots do provide rather more information than a simple confidence interval. For instance, you can obtain any 100 (1−α)% confidence interval that you want from the same plot.
In this case the 95% interval is 0.077 to 0.389, and being less conservative, this range is narrower (0.389 to 0.077 = 0.312) than that given by Clopper-Pearson (0.407 to 0.068 = 0.3397).
Using R we compared the results of the normal approximation and score methods for this example. The simple Wald 95% confidence interval is 0.043 to 0.357. Note it is incorrectly shifted to the left. The adjusted Wald interval is 0.074 to 0.409, much closer to the mid-P interval. The Wilson score interval is similar at 0.089 to 0.391.
Poisson methods for counts and rates
Normal approximation Poisson interval for counts & rates
We look at data provided by Memon et al. (1998) on the age specific incidence of hip-fracture in Kuwait in the examples. Here we will use the figures on the incidence of fracture in 50-59 year old Kuwaiti women to demonstrate estimation of the confidence interval. The number of fractures was (f=) 13 out of an estimated mid-year population of 46021. This gave an estimated rate of fracture per 100,000 of (100000 × 13 / 46021 =) 28.25.
Note that we are using estimated mid-year population size as the denominator in place of person-time-at-risk. Since this is not fixed and known without error (as it may be in a cohort study), the estimated confidence intervals will be excessively liberal because we have not taken the variability in the denominator into account.
The simple normal 95% confidence interval for the number of cases assuming a Poisson distribution of cases is given by:
95% CI(Y) = 13 ± 1.96 × √13 = 5.933 to 20.067
Converting these to rates per 100,000, we get a confidence interval of 12.89 to 43.61.
However, this estimate has not been bias-corrected. The formula given below includes a 'continuity-correction' which ensures inference is uniformly conservative, therefore in terms of inference, it is a source of bias. This 'bias corrected' estimate gives us:
|| − √13
|| = 6.8935
|| + √14
|| = 22.294
Converting these to rates per 100,000, the confidence limits are 14.91 to 48.44.
Exact Poisson interval for counts & rates