 
Bootstrap confidence intervals
On this page:
Definition & Properties
General assumptions
Types of bootstrap interval
Normal, Standard, confidence limits
Simple Percentile, Efron's, Quantilebased, Approximate
Backwards, Basic, Hall's, second percentile, Hybrid bootstrap
Bias Corrected percentile
Accelerated Bias Corrected percentile, ABC, Bac
Studentized, bootstrap t intervals
Smoothed, jittered bootstrapping
Testinversion bootstrap
Definition and Properties
'Bootstrapping' describes a process which aims to estimate how a statistic's value will vary when it is calculated from random samples of an infinite population. To achieve this a model population is constructed from your sample of observations, and resampled so as to mimic how those observations were obtained. There are many ways of going about this process, and modifications thereof, but they share a common underlying logic.
Bootstrapping, or resampling data with replacement, is justified to the extent it replicates how that data were obtained  at any rate, if this assumption is unrealistic, its results should be treated accordingly.
Bootstrapping is primarily used to attach confidence limits to awkward statistics, but can be used in a variety of other situations. For example bootstrapping enables you to attach limits to very ordinary statistics, such as a mean, where central limit theorem cannot be assumed to be effective. Bootstrapping can also test hypotheses, estimate power, and yield Pvalue plots.
Apart from its day to day use, research statisticians also employ bootstrapping to improve their understanding of the behaviour of statistics calculated from samples of known populations. Until recently most theoretical work has concentrated upon statistics that behave like means  particularly those found in introductory stats textbooks. Since their theoretical framework is extensive, we expand upon it as a related topic.
Although bootstrapping does not work for all statistics, or in all situations, it is especially useful where the distribution of a statistic cannot be predicted mathematically  or the where mathematics are unwieldy, or where predictions are only valid for unduly large samples, or where they rely upon obviously unreasonable assumptions  such as observations representing normal populations.
Aside from the bootstrap statistic having to use the same formula as the (population) statistic being estimated, bootstrap confidence limits suffer many of the same problems as ordinary confidence intervals. Although there are problems if a statistic is distributed asymmetrically, heteroscedacticity is more difficult to allow for. If possible therefore, it is best to apply a variancestabilising transformation to your observations before bootstrapping.
General assumptions
 With a very few exceptions, that are specifically designed to do otherwise, bootstrap models assume each observation is randomly selected from its population.  In other words, that any observation is equally likely to be selected, and its selection is independent.
 That the sample represents the population it was drawn from  in other words, that it contains sufficient information, and its selection was unbiased.
 The population is infinite, or sufficiently large that the effect of taking a sample is negligible.
 Additional assumptions, such as linearity, smoothness, symmetry, homoscedasticity, and bias, depend upon the statistic, and your method of bootstrapping it.
 Bootstrapping does not assume your sample is the same as its population  unless you have sampled the entire population this is clearly impossible.
 By replicating the process by which your original sample was selected, bootstrapping aims to estimate how a statistic will vary  compared to the same statistic when calculated from the observations being sampled. In other words, bootstrap statistics are 'plugin' estimators.
 Secondstage bootstrapping, which resamples each bootstrap sample, aims to estimate the errors of the firststage estimates.
 Some forms of bootstrapping modify the model population's parameters to meet specific hypotheses  assuming, of course, both the modification and its effects are plausible.
 Some bootstrap procedures require additional distributional assumptions  of the data, or the resulting statistics.
Types of bootstrap interval
Owing to its potential, considerable research has been invested in bootstrapping and the applications thereof  but confidence intervals are by far their most common use. Many methods of obtaining bootstrap confidence intervals have been devised, but relatively few of these have made their way into standard textbooks for biologists. Relatively few authors state which bootstrap confidence interval they have used but, in as far as it is possible to judge, the majority are either simple percentile or accelerated bias corrected percentile intervals.
This page (briefly) describes just nine methods  either because they are commonly mentioned, or because they provide some insight. The simplest methods, with most restrictive assumptions, are listed first. We ignore bootstrap procedures that require you evaluate all possible outcomes (the entire sampling distribution), or all tail end outcomes  despite the fact they are loved by a certain creed of traditional statistician. The reasons for this omission are strictly practical:
 For anything other than extreme tailend values, their approach to bootstrapping requires considerable computational and programming resources  even for small samples.
 Some packages cater for that feltneed, and there seems little reason why you (or us) need duplicate their investment.
 A random (Monte Carlo) subset of the sampling distribution enables you to produce perfectly acceptable results for most practical purposes. This is comparatively quick, easy, informative, wellresearched and increasinglypopular.
Again, since it is patently impossible to do otherwise, this page confines its efforts to single samples  and only considers one or two simple statistics.
To avoid needless repetition let us assume that you wish to obtain 95% confidence intervals for a sample statistic, , which is a plugin estimator of its parameter, Θ. Assuming you have a sample of n, randomlyselected observations, in all the methods below each bootstrap resample contains n observations (selected randomly and with replacement) from that original sample. _{i}^{*} is the bootstrap statistic calculated from the ith bootstrap resample (using the same, plugin, formula for your sample statistic as its population parameter). Similarly, _{ij}^{**} is the jth second stage bootstrap statistic, calculated from n observations (selected randomly and with replacement) from the ith bootstrap sample.
Since each method makes its own additional assumptions, let us summarise their general assumptions.
Assumptions common to bootstrap confidence limits:
 Your sample resembles the population it was drawn from sufficiently well that resampling it enables you to estimate how a sample statistic would vary  and the same is true if you are quantifying the errors in your bootstrap statistics.
 You resample your sample in the same way as you sampled its population  and the same is true if you resample resamples.
 Your population statistic, Θ, sample statistic, , and bootstrap statistic, ^{*} are calculated in the same way.
 2tailed limits either assume your statistic is distributed symmetrically, or is not too badly skewed, or this fact has been allowed for  which may not be easy!!

 Normal, or Standard confidence limits
Additional assumptions:
 Your statistic is unbiased.
 It is normally distributed (and thus homoscedastic).
 The standard error of your bootstrap statistic and sample statistics are the same.

To calculate conventional 95% 2tailed limits:
 Calculate the statistic of interest, from your sample of n observations.
 From your sample of n observations randomly select (with replacement) a bootstrapsample, of n observations, and (using the same formula as your sample statistic) calculate a bootstrap statistic ^{*}.
 Repeat step 2 to obtain (B=) 50 or 100 bootstrap statistics.
 Calculate the standard deviation of your bootstrap statistics,
 Estimate the confidence limits of as ±1.96
Notice this method only employs bootstrapping to estimate the statistic's standard error, as but this may be more robust than using a jackknife estimate of it. Normal bootstrap confidence intervals could be viewed as semiparametric because they assume the statistic has a known (normal) distribution but do not assume this of the observations that statistic is calculated from. In most cases this assumption is more reasonable if the observations are approximately normal  or have been normal transformed.
One advantage of this method is it requires little effort to calculate, even without a computer  and is better than nothing for statistics for which you have no other way of obtaining confidence limits. Unfortunately, this method assumes your statistic is subject to central limit theorem (asymptotically normal). It also assumes your sample is big enough for this to be a fair approximation to reality  which, in many instances requires extremely large samples. Whilst you can use quantilequantile plots to assess how normal your bootstrap estimates are  even if its other assumptions are met, this method will consistently underestimate the confidence interval, especially for small to moderate samples.
 Simple Percentile, or Efron's, or Quantilebased, or Approximate intervals
Additional assumptions:
 Your statistic is unbiased and homoscedastic.
 The bootstrap statistic can be transformed to a standard normal distribution
 Its untransformed and detransformed distribution are the same.
 The standard error of your bootstrap statistic and sample statistics are the same.

To calculate conventional 95% 2tailed limits:
 Calculate the statistic of interest, from your sample of n observations.
 From your sample of n observations randomly select (with replacement) a bootstrapsample, of n observations, and (using the same formula as your sample statistic) calculate a bootstrap statistic ^{*}.
 Repeat step 2 to obtain (B=) 1000 to 5000 bootstrap statistics.
 Sort your bootstrap statistics into rank order.
 Estimate the confidence limits as the 2.5% and 97.5% quantiles of your bootstrap statistics.
Note:
 Some statisticians believe that, under conventional inference, Pvalues and percentile confidence limits should not be estimated by interpolation. Conventionally, the lower confidence limit is taken to be whichever bootstrap statistic is just less than its α/2 quantile. The upper confidence limit is whichever bootstrapstatistic that is just greater than the 1−α/2 quantile.
Simple percentile intervals estimate the statistic's theoretical range, and are sometimes said to be nonparametric because they obtain the critical values by rank. Confusingly, a distribution of bootstrap statistics obtained in this way is often referred to as a 'basic' bootstrap, as are Hall's (described below).
This method is relatively simple to perform and interpret, and does not produce impossible or infinite intervals. It does not produce too unreasonable results if your statistic is positively skewed (heteroscedastic)  and works moderately well for statistics that behave as means, when calculated from sufficiently large samples. Simple percentile limits do not assume your statistic is normal, but they do assume those statistics can be rescaled
to make them so. Nor is there any assurance that its limits will compatible with testinversion limits  when applied to the same population. In other words, in many situations percentile limits would be reversed. Also, for small samples, this method consistently undercovers.
 Backwards, or Basic, or Hall's (second percentile), or Hybrid bootstrap limits
Additional assumptions:
 Your statistic is unbiased and homoscedastic.
 The bootstrap statistic can be transformed to a standard normal distribution
 The confidence limits equivalent to testinversion limits.
 The standard error of your bootstrap statistic and sample statistics are the same.

To calculate conventional 95% 2tailed limits:
 Calculate the statistic of interest, from your sample of n observations, and obtain lower and upper limits in the same way the percentile method (above).
 Transpose the upper and lower limits about
This method is relatively simple to perform, but can produce impossible limits, albeit not infinite ones. It works moderately well for statistics that behave as means, when calculated from sufficiently large samples, but can produce unreasonable results if your statistic is positively skewed. Again, these limits do not assume your statistic is normal, merely that a normalizing transformation is possible  once again, it is the statistic which is assumed to be rescalable, not the data in your sample. These limits are often compatible with similarly calculated testinversion limits. Like simple percentile limits, this method consistently undercovers when calculated from small samples.
 Bias Corrected percentile limits
Additional assumptions:
 The bootstrap statistic can be transformed to a normal distribution
 The normaltransformed statistic has a constant bias.
 Its untransformed and detransformed distribution are the same.
 The standard error of your bootstrap statistic and sample statistics are the same.

To calculate conventional 95% 2tailed limits:
 Calculate the statistic of interest, from your sample of n observations, and obtain bootstrap estimates of that statistic in same way as the percentile method (above).
 Estimate the bias in your bootstrap estimates (p_{b}) as the proportion of bootstrap estimates that exceed your sample estimate. Thus p_{b} = p(^{*} > )
 Convert that proportional bias to standard normal deviates using the inverse (quantile) normal distribution function. Thus b = Φ^{−1}(p_{b})
 Subtract twice that bias from the standard normal 95% limits. Thus L = −1.96 − 2b and U = +1.96 − 2b
 Convert those corrected standard normal limits to proportions using the normal (probability) distribution function. Thus p_{L} = Φ(L) and p_{U} = Φ(U)
 Like Efron's, estimate the confidence limits as the corresponding (p_{L}th and p_{U}th) quantiles of your bootstrap statistics.
This method is useful in attaching confidence limits to a statistic for which no analytical bias correction is available (in other words where there is no formula, or where that formula is unreliable). The problem this method addresses is that, although you can estimate the bias of a statistic, for example by jackknifing, if its bootstrap statistics are not normal you cannot assume the confidence limits will be similarly biased. In order to do so, this method assumes the same function (f) would transform both your sample statistic and bootstrap statistics to a normal distribution  with a standard deviation of 1, whose locations differ by a bias, b.
 Accelerated Bias Corrected percentile limits, or ABC, or Bac
Additional assumptions:
 The bootstrap statistic can be transformed to a standard normal distribution
 The bias of your normaltransformed statistic is directly related to its location.
 Its untransformed and detransformed distribution are the same.
 The standard error of your bootstrap statistic and sample statistics are the same.

To calculate conventional 95% 2tailed limits:
 Calculate the statistic of interest, from your sample of n observations, and obtain bootstrap estimates of that statistic in same way as the percentile method (above).
 Estimate the bias in your bootstrap estimates (p_{b}) as the proportion of bootstrap estimates whose value are greater than your sample estimate. Thus p_{b} = p(^{*} > )
 Convert that proportional bias to standard normal deviates using the inverse (quantile) normal distribution function. Thus b = Φ^{1}(p_{b})
 Estimate the 'acceleration constant' (a) using this jackknife estimate:
a ≈
 Σ(. − _{j})^{3}
  (
 √
  )  ^{3
} 
6Σ(. − _{j})^{2
} 
 Correct the standard normal 95% limits, so:
L = (−1.96−b)/(1−a*(−1.96−b)) − b and
U = (+1.96−b)/(1−a*(+1.96−b)) − b
 Convert those corrected standard normal limits to proportions using the normal (probability) distribution function. Thus p_{L} = Φ(L) and p_{U} = Φ(U)
 Estimate confidence limits as the corresponding (p_{L}th and p_{U}th) quantiles of your bootstrap statistics.
If a is zero this model reduces to the constant bias model described above. Again, like bias corrected limits this method assumes some function of your bootstrap estimates, f[^{*}], is normal. But, it also assumes the bootstrap estimate's bias is a linear function of f[]. If those assumptions are correct then f[^{*}] is normal, with a mean of f[] + b{1+af[]} and a standard deviation of 1 + af[]. Here b is the bias, assuming that bias is unrelated to its location. Whereas a describes how that bias varies with the value of your sample estimate (assuming its effect is directly proportional), and is recognisably a function of the jackknife skew.
 Studentized, or bootstrap t intervals
Additional assumptions:
 Your statistic is homoscedastic.
 The statistic is distributed smoothly  and is otherwise not too badly behaved.
 Its distribution is pivotal  in other words, constant for all population parameters.
 The standard error of your bootstrap statistic can be estimated by secondstage resampling.

To calculate conventional 95% 2tailed limits:
 From your sample of n observations randomly select (with replacement) a bootstrapsample, of n observations, and calculate a bootstrap statistic ^{*}
 Resample your bootstrapsample an calculate a secondstage bootstrap statistic ^{**}
 Repeat step i to obtain (B1=) 50 or 100 statistics and find their standard deviation, s*.
 Calculate a bootstrap t statistic as ^{*}/s*
 Repeat these steps to obtain (B=) 1000 to 5000 values of ^{*} and ^{*}/s*.
 Find s, the standard deviation of * (your firststage bootstrap statistics) then calculate your statistic of interest as /s.
 Sort your bootstrap statistics into rank order.
 Estimate the confidence limits as the 2.5% and 97.5% quantiles of those bootstrap t statistics.
Studentized bootstrap statistics are known as bootstrap t statistics because they aim to reduce coverage error in the same way that a tstatistic improves the precision when testing or attaching confidence limits to a mean. In effect a second stage of bootstrapping, resampling each bootstrap sample, is used to improve your estimate of how your sample statistic should vary. The standard error of each bootstrap statistic is estimated as the standard deviation of the same statistic calculated from its secondstage samples.
 Smoothed, or jittered bootstrapping
Additional assumptions:
 Your statistic is homoscedastic.
 The statistic can sensibly be smoothed, and is otherwise not too badly behaved.
 Other assumptions may apply

Note: smoothing can be applied in addition to any other nonparametric bootstrap technique. The steps, below, apply it to a simple percentile interval.
To calculate conventional 95% 2tailed limits:
 Calculate the statistic of interest, from your sample of n observations.
 Decide upon an appropriate kernel for your smoothing function, such as normal.
 From your sample of n observations randomly select (with replacement) a bootstrapsample, of n observations.
 Calculate the appropriate bandwidth for that bootstrap sample.
 Using the kernel & bandwidth add a random error to each value of that bootstrap sample.
 Using the same formula as your sample statistic, calculate a bootstrap statistic ^{*}.
 Repeat steps 3 to 6 to obtain (B=) 1000 to 5000 bootstrap statistics.
 Sort those bootstrap statistics into rank order.
 Estimate the confidence limits as the 2.5% and 97.5% quantiles of your bootstrap statistics.
Smoothed bootstrapping helps allow for the fact that, although the cumulative distribution of your samples generally provide a good approximation of their population, their relative frequency distribution is a very much poorer model. Smoothed bootstrap estimates employ the same reasoning as Gaussian smoothed distribution plots, but add a random error to each observation in your bootstrap sample.
Smoothed bootstraps can both reduce coverage error and increase robustness  particularly for small samples, for simple percentile intervals, or for studentized bootstraps, and for statistics such as medians. Smoothed bootstrap estimators have both a higher variance and a smoother distribution than their unsmoothed equivalents. Smoothing can improve the coverage properties of simple percentile to that of bootstrap t. Variance stabilising transformations are more important than normalising ones, and a Gaussian smoothing kernel may not be the best choice if the transformed observations are markedly nonnormal. Smoothing also noticeably increases the computational load and complexity of your analysis.
 Testinversion bootstrap intervals
Additional assumptions:
 Your statistic is homoscedastic, or your model allows for this.
 Other assumptions may apply

Note: Testinversion can (in principle) be applied to virtually any type of bootstrap model, parametric or nonparametric. For simplicity, the steps below assume is a simple summary statistic of a single sample.
To calculate conventional 95% 2tailed limits:
 Calculate the statistic of interest, from your sample of n observations.
 Use your sample to construct a model population (Y_{null}) modified so the value of Θ is known (Y_{null}Θ)
 Using bootstrapping, estimate the distribution of * then test against that distribution and record the Pvalue.
 Repeat steps 2&3 using a series of values for Θ.
 Estimate which values of Θ yield Pvalues of 0.25 and 0.975
Test inversion confidence limits can be obtained for asymmetrically distributed statistics, nonpivotal statistics and those with discrete or awkward distributions. They are also useful where parameter values are expected to bias the outcome, or to construct Pvalue plots, or to assess power.
Provided the statistic is homoscedastic, nonparametric test inversion limits are estimated by constructing model populations through modifying the observations  for example to shift the statistic's mean location. To reduce the computational overhead several algorithms are available, which iteratively estimate the required confidence limits.
Test inversion limits can produce more reliable limits than other methods, especially when applied to studentized or smoothed bootstraps, but their coverage accuracy is limited by the fact that their models tend to require estimation of (nuisance) parameters  in addition to the one of interest. When, for example, the statistic is heteroscedastic, your model should allow for how your statistic's dispersion varies with location. Then again, if the statistic of interest is the difference between two proportions (p1−p2), you will need to change both P1 and P2. A further limitation arises from the difficulty of constructing suitable null populations.
Related
topics :

Principles and properties of bootstrap estimators
How reliable are your estimates?

A parametric or nonparametric bootstrap?
Confidence limits by permutation

