 InfluentialPoints.com
Biology, images, analysis, design...
 Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

## Confidence intervals of p - using R

### Binomial confidence interval functions

1. Simple (large-sample) normal approximation Wald interval
3. Wilson score binomial interval
4. Exact binomial, Clopper-Pearson (conventional P) interval
All of these can be calculated, as 95% confidence limits, as follows:
Note, these instructions assume you have already installed the epitools function library.

 > # Wald interval (no c.c.) > binom.approx(y, n) x n proportion lower upper conf.level 1 5 25 0.2 0.04320288 0.3567971 0.95 > > # Adjusted Wald interval > binom.approx(y+1.96, n+1.96^2) x n proportion lower upper conf.level 1 6.96 28.8416 0.2413181 0.08516046 0.3974757 0.95 > > # Wilson score binomial interval > binom.wilson(y, n) x n proportion lower upper conf.level 1 5 25 0.2 0.08860585 0.3913095 0.95 > > # Conventional exact binomial interval > binom.exact(y, n) x n proportion lower upper conf.level 1 5 25 0.2 0.06831146 0.4070374 0.95  Note:
• binom.approx calculates an interval using the normal approximation to the binomial distribution. The intervals are not continuity corrected, which is why they are sometimes described as 'equivalent to mid-P' intervals. To obtain continuity corrected intervals use the text-book formula - as in our worked example. • Given that 1.96 ≅ 2, you can approximate the adjusted Wald 95% interval, as follows:

But WHY you should wish to use 2 instead of 1.96, or qnorm(.975), is hard to imagine.

• binom.wilson calculates the Wilson score binomial interval. These intervals are not continuity corrected.

• binom.exact gives the conventional exact binomial interval obtained by test inversion using the binomial test.

### Mid-P exact by test inversion

The following code instructs R to estimate mid-P exact binomial confidence limits by test inversion, using a predetermined series of values of its test parameter (P).

Gave us:

 > CL # Lower conf. limit  0.07718503 > CU # Upper conf. limit  0.3890471  Note:
• In order to make this code quick and simple, these mid-P-values are of 1-sided tests. Using 2-sided tests would produce the same result, but only because we are using mid-P-values.

• The simplest way to increase the accuracy of these intervals is to increase R=1000 to perhaps R=100000.

### P-value plots

Since P-value plots are commonly used as analogues of (2-sided) confidence limits, they are usually plotted using P-values from 2-sided tests - even when it is computationally easier to obtain those P-values from 1-sided tests. Then again, for various reasons you may prefer to rearrange plots of 1-sided P-values such that P-values greater than 0.5 are plotted as 1-P-value. Plots thus-rearranged are identical to those of 2-sided tests - provided those tests use mid-P-values, or the statistic's distribution is smooth.

The code below produces a P-value plot, using mid-P-values from 1-sided exact binomial tests of the observed proportion p (=f/n), and a predefined set of uniformly-distributed values for these tests' null-distribution parameter (P).

Gave us:   Note:
• On this plot the 95% confidence limits are where these P-values intersect the α/2 line - which you could indicate using abline(h=0.05/2)

• Prior to rearrangement, each P-value is the probability of observing a proportion less than the observed value of p, plus half the probability of observing that value.

• For tests of discrete statistics using conventional P-values, the upper tailed P-value does not equal one minus the lower-tailed P-value - so they must be obtained separately.

The following instructions give a P-value plot of conventional P-values from 2-tailed tests, again using exact binomial tests.

This code sets up its own test function, and assumes you have set the number of test parameters (R).

N.B. these conventional P-values have a maximum which is above 0.5 (where p=P), whereas mid-P-values peak at 0.5 (again where p=P).

• You could, of course obtain a P-value plot using mid-P-values of 2-tailed tests, as is done by the code below.

• The binomial test functions shown above work quickly because they can deal with many test-parameters at once. This may be impossible for more complicated tests. In that situation you may have to do the tests one-by-one. The following instructions use an 'iterative loop' to produce P-values to plot, using the mid-P exact binomial test function (test.mpv, given above), again with a predefined set of parameters.

Using R, this is relatively slow. But if you want to estimate confidence limits by a process of successive approximation (iteratively), each test parameter will depend upon the outcome of previous tests.

### Poisson confidence interval functions

1. Normal approximation Poisson interval for counts & rates
2. Exact Poisson interval for counts & rates
Either of these can be calculated, as 95% confidence limits, as follows:
Note, these instructions assume you have already installed the epitools function library.

 > # Approximate Poisson interval (no c.c.) > pois.approx(y, n) x pt rate lower upper conf.level 1 13 46021 0.0002824797 0.0001289248 0.0004360347 0.95 > > # Exact Poisson interval > pois.exact(y, n) x pt rate lower upper conf.level 1 13 46021 0.0002824797 0.0001504085 0.0004830486 0.95  Note:
• pois.approx is gives the same results as this formula:
95% CI(Y) = (f ± 1.96 × √f)*100000/n

The following code gives 'bias-corrected' approximate limits for f:

• The following code produces a mid-P exact Poisson P-value plot for f, plus 95% confidence limits, using test-inversion.
Notice that, because the test parameter (F) can vary from zero to infinity, and test parameters are set beforehand (rather than iteratively) we set the range of F using the 99.5% exact conventional limit formula.