Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
Confidence intervals of proportions and ratesOn this page: Normal approximation binomial interval, Wald interval Continuity-corrected Wald interval Adjusted, modified, Wald interval Wilson score binomial interval Exact binomial, Clopper-Pearson, interval Mid-P exact by test inversion Poisson methods for counts & rates Exact Poisson interval for counts & rates
Simple normal approximation (Wald interval)
The proportion of randomly selected broom seedlings (Cytisus scoparius) surviving for one year post-germination was determined for 850 seedlings to be 0.400. The simple normal approximation 95% confidence interval is given
Continuity-corrected Wald interval
If our sample size of broom seedlings were only 50 rather than 850 (under conventional inference) we would have to use the continuity correction. If 20 survive out of 50 (p = 0.4), the confidence limits are given
Note the much wider confidence interval resulting from the smaller sample size. It is also a little wider because of the use of the continuity correction (with no continuity correction the interval would have been 0.264 to 0.536).
Adjusted (modified) Wald interval
If you wish to use a normal approximation confidence interval when sample size is greater than 40, then use this one!! As you can see below, it approximates to the Wilson score interval.
a = 0.4 +(1.962/100) = 0.4384146
c = 1+ 1.962/50 = 1.076829
Note this 95% CI is similar to that provided by the adjusted Wald interval (0.2716 to 0.5441).
Exact binomial intervals
We will test the exact methods by looking at the result if only 5 seedlings survived out of a total of 25. Hence p = 0.20. The simple normal approximation would be wholly inappropriate (pqn < 5) and some of the other methods may have problems. We will estimate conventional exact and mid-P exact intervals, and compare these in R with those estimated by the other methods.
If we use the tables given by Conover (1999) we get 95% confidence limits of 0.068 to 0.407.
Hence the range 0.068 to 0.407 should enclose the true proportion of seedlings surviving on at least 95% of occasions, assuming that range is calculated in a similar fashion from randomly selected samples of the same population. These are the same as the confidence interval determined from the table values, and that given by R (specifically we used the epitools library binom.exact function, which does it by test inversion).
Once again we will obtain limits for an observed proportion (p) of 0.2, where only 5 seedlings survived out of a total of (n=) 25. Unlike conventional exact binomial intervals, exact mid-P intervals can only be obtained by test-inversion.
Test inversion intervals work under the definition that a confidence interval about an observed statistic encloses a range of parameters which, when tested, would not reject that observed statistic.
In this case, the simplest solution is to perform exact 1-tailed tests of the observed proportion (p) for each of a predetermined series of test parameters (P) - then compare each value of P to the resulting mid-P-value. Since this method is unlikely to yield mid-P-values of exactly (α/2=) 0.025 or (1− α/2=) 0.975, the upper and lower limits (of P) are usually estimated by interpolation (either graphically, or arithmetically).
The P-value plots below show the exact mid-P-values obtained by tests of our observed p (=0.2). Our top graph shows the result of 1000 1-sided tests of p, each against the null hypothesis that p arose from a random sample of n values from a population of ones and zeroes - of which a set proportion of values (P) values equal 1. So p was compared to each of those, 1000 binomial, null populations (where P=0, 0.0005, 0.0015 ... 0.9995, P=1). For each test the mid-P-value is the proportion of that binomial population that is less than p, plus 1/2 of the proportion which equals p. Thus, when P<<p nearly all the null population is <<p, and the test's P-value approaches 1.
Our lower graphs show the lower and upper 95% confidence limits (CL & CU) estimated by simple linear interpolation. Each point corresponds to one test result. Since we did these interpolations arithmetically, rather than graphically, these three P-value plots are only provided to illustrate the principle. Indeed, a more 'efficient' method would be to find them by successive approximation - at the expense of finding an efficient 'search' algorithm, and some more-complicated programming.
That point aside, as we have noted elsewhere, P-value plots do provide rather more information than a simple confidence interval. For instance, you can obtain any
In this case the 95% interval is 0.077 to 0.389, and being less conservative, this range is narrower (0.389 to 0.077 = 0.312) than that given by Clopper-Pearson (0.407 to 0.068 = 0.3397).
Using R we compared the results of the normal approximation and score methods for this example. The simple Wald 95% confidence interval is 0.043 to 0.357. Note it is incorrectly shifted to the left. The adjusted Wald interval is 0.074 to 0.409, much closer to the mid-P interval. The Wilson score interval is similar at 0.089 to 0.391.
Poisson methods for counts and rates
We look at data provided by Memon et al.
Note that we are using estimated mid-year population size as the denominator in place of person-time-at-risk. Since this is not fixed and known without error (as it may be in a cohort study), the estimated confidence intervals will be excessively liberal because we have not taken the variability in the denominator into account.
However, this estimate has not been bias-corrected. The formula given below includes a 'continuity-correction' which ensures inference is uniformly conservative, therefore in terms of inference, it is a source of bias. This 'bias corrected'
Converting these to rates per 100,000, the confidence limits are 14.91 to 48.44.
We use the same data as above on the age specific incidence of hip-fracture in Kuwait. The number of fractures was 13 out of an estimated mid-year population of 46021. The (conventional P) exact interval is given