Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)



Bootstrap confidence intervals: Use & misuse

(percentile-t, accelerated bias-correction, statistical independence, number of replications)

Statistics courses, especially for biologists, assume formulae = understanding and teach how to do  statistics, but largely ignore what those procedures assume,  and how their results mislead when those assumptions are unreasonable. The resulting misuse is, shall we say, predictable...

Use and Misuse

Bootstrap confidence intervals are becoming more widely used as the software becomes available - but they still tend to be the exception rather than the rule even in situations where they really should be used. For example asymptotic (normal approximation) intervals are known to be unreliable for commonly used measures such as attributable risk proportion and even for simple means of skewed data.

One important issue in the literature is that many authors simply state that they attached bootstrap confidence intervals without any further elaboration. It is essential to state clearly the type of bootstrap interval attached - whether bootstrapping was used to estimate the standard error  and then this was used in a normal approximation interval, or whether some sort of percentile interval (often corrected in some way) was estimated. It is true that the terminology of bootstrapping is by no means universally agreed - and we give  as an example of this a procedure described as 'non-parametric' despite being based on the Poisson distribution - albeit overdispersed by using the variance estimate as the Poisson parameter. One should therefore always attach a reference indicating the author who first used your particular type of bootstrapped interval.

It is also essential to describe exactly how the bootstrapping was done and the number of replications involved. Whilst (some) bootstrap confidence intervals enable one to ignore normality assumptions, they do not get round the need for statistical independence of observations unless this is specifically accounted for in the bootstrapping method. We give one medical example where observations which were clearly not statistically independent were pooled both for calculation of the normal approximation confidence interval and (apparently) for bootstrapping. Both the resultant intervals were similar - and both were wrong! We also give a veterinary example on faecal egg count reduction where it was unclear exactly what was bootstrapped, and where pairing was apparently carried out on an arbitrary basis.

There is a worrying tendency especially with ecological papers simply to change the probability level of the confidence interval if they are (felt to be) too wide. Whilst there is nothing 'God-given' (or even 'Statistician-given') about 95% intervals, at least they provide a general point of reference. Changing to 80% intervals merely draws attention to the variability of the data! Another issue we encountered is that it is not always clear whether researchers are drawing the right inferences from the confidence interval. For example comparing the value of a statistic obtained in one study with the 95% confidence interval attached to the mean value of that statistic from 30+ studies would seem to be misleading...


What the statisticians say

Davison & Hinkley (2006) and Efron & Tibshirani (1993) remain the standard texts on bootstrapping. Other useful texts include Zieffler et al. (2011), Chernick (1999) and Manly (1997) . The topic is not well covered in most general statistics texts but some information for medical applications is given in Kirkwood & Sterne (2003) , for veterinarians (very briefly) in Thrusfield (2005) and for ecologists in McGarigal (2000). Haddon (2000) provides more extensive treatment for fisheries' scientists.

Hesterberg (2008a) argues that it's time to retire the n > 30 rule for adoption of parametric methods, and only use bootstrap methods. Hesterberg (2008b) reviews bootstrap methods. Davison & Kuonen (2005) provide a brief introduction to the bootstrap with applications in R. Wood (2005) (2004) and Grunkemeier & Wu (2004) provide accessible introductions to bootstrap confidence intervals. Greenland (2004) describes the advantages of interval estimation by simulation whilst Carpenter & Bithell (2000) provide a more formal guide to bootstrap confidence intervals for medical statisticians. Platt et al. (2000) advocate bootstrap confidence intervals for the sensitivity of a quantitative diagnostic test. Llorca & Delgado-Rodriguez (2000) compare several procedures to estimate the confidence interval for attributable risk proportion in case-control studies. Crowley (1992) reviews resampling methods for data analysis in ecology and evolution. Meyer et al. (1986) compare jackknife versus bootstrap methods for estimating uncertainty in population growth rates.

Hesterberg (2001) and Hesterberg (1999) look at bootstrap tilting for when the sampling distribution depends on a parameter of interest. Carpenter (1999) looks at the performance of test inversion bootstrap confidence intervals. Killian (1999) looks at finite-sample properties of percentile and percentile-t bootstrap confidence intervals. Hall (1997) provides much of the theoretical background to bootstrap confidence intervals. DiCiccio & Efron (1996) review bootstrap confidence intervals. Hall (1988a) (1988b) concludes that percentile-t and accelerated bias-correction are two of the more promising of existing techniques. Efron & Gong (1983) take a leisurely look at the bootstrap, the jackknife, and cross-validation.

Wikipedia provides a section on bootstrapping. Rich Herrington describes how to use the smoothed bootstrap for small data sets. A.J. Canty gives an informative account of resampling and bootstrap confidence intervals using the boot package in R.