Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)



In principle there are three different ways of obtaining and evaluating bootstrap estimates: non-parametric, parametric, and semi-parametric. In practice, because nonparametric intervals make parametric assumptions, this division is rather arbitrary. Whilst these terms may provide some insight, they are a not very useful classification. Nevertheless, since 'nonparametric' intervals are so popular let us consider them first.


  1. Nonparametric bootstrap confidence intervals
  • Equal tail confidence limits

    Efron's Simple Percentile Confidence Intervals, although arithmetically straightforward and relatively simple to interpret, are surprisingly controversial. To understand the strengths, limitations, and extensions of the common-or-garden nonparametric bootstrap we should first summarize the basic reasoning.

    1. For simplicity, assume you have a set of (n) observations from which you have calculated some statistic (), for which you have no formula to estimate a standard error, but to which you wish to attach (ordinary 2-tailed 95%) confidence limits.
    2. You cannot reasonably assume your sample (or your sample statistic) represents a known frequency distribution, but can assume it adequately reflects the wider population from which it was drawn.
    3. Using your sample as a model of it's population, you take B samples of n observations (with replacement), from which you calculate B (plug-in) bootstrap estimates of your sample statistic. - This is sometimes referred to as a basic bootstrap.
    4. Assuming these bootstrap statistics vary in a similar fashion to your sample statistic, when similarly obtained, then the most typical 95% of those bootstrap statistics would have 95% confidence limits which enclose the population parameter, Θ - of which your sample statistic is the best available estimate. - Conversely, 95% of your bootstrap estimates should fall within the 95% confidence limits about your sample statistic.

    As a rough-and-ready first approximation this interval estimate might seem acceptable enough, unfortunately it conceals a number of crucial assumptions - not all of which are reasonable.

    1. Ordinary 2-sided confidence limits assume your statistic is distributed smoothly and symmetrically.
    2. They also assume the statistic is unbiased and homoscedastic.
    3. Because normality is an asymptotic quality, intervals calculated from finite samples are too narrow - and consistently undercover.

    Since these assumptions raise a number of issues, let us consider them in more detail.

    1. The unknown normalizing function

      Because they were designed for a parametric normal universe, ordinary 2-sided 95% confidence limits assume your sample statistics are distributed normally - or that a normalising transformation has been applied. The normalising transformation can either be applied to the observations or to the statistic itself - in which case f[*] is normal - and f is the normalizing function. Critically, it is assumed that f (whatever it might be) does not alter the rank of your estimates. Confidence intervals which are estimated under those assumptions are described as being transformation respecting.

      For instance, the graph set below shows the cumulative distributions of ( B = ) 100 bootstrap means, with and without an appropriate normalizing transformation. The observed estimate ( ) is tinted violet, but the highest and lowest 5% of these estimates are orange. The grey rectangle encloses the central 90% of bootstrap estimates, shown in green - the estimated 90% non-parametric confidence interval. Curves tinted blue are cumulative normal distributions, fitted to our transformed and untransformed bootstrap estimates.

{Fig. 1}
Pcle1.gif from Pcle1a&b.stg using Pcle.sta

      Controversially, you do not have to actually find the appropriate normalizing transformation to obtain these confidence intervals - provided it can be argued that a suitable re-scaling function might plausibly exist. All too often it is simply not considered - but in a number of situations, such as where the statistic's distribution is highly skewed or discrete, this assumption is clearly violated. This causes problems for users of various maximum likelihood estimators (including L-estimators), means of small samples of heavily tied populations, proportions of small samples, and quantiles - including the median and maximum. In practice therefore, if you can find a normalising transformation, it is best to use it.

      A less obvious consequence of this line of reasoning is revealed when we evaluate confidence limits by test inversion. In which situation it turns out that, if the other assumption for simple percentile intervals are met (particularly homoscedasticity) any skew in the statistic produces the opposite result upon the confidence limits. In consequence Hall described simple percentile limits as 'backwards', and suggested they ought to be reversed. Subsequently he combined this point with our earlier one regarding coverage - producing this memorable quote:

      "Using the percentile method critical point amounts to looking up the wrong [statistical] tables backwards."
      P. Hall (1988)
      Theoretical comparison of bootstrap confidence intervals, Annals of Statistics 16, 927-953

      In consequence, simple percentile bootstrap confidence limits are also described as backwards limits, nonparametric bootstrap limits, basic bootstrap limits, or all too often, just as bootstrap confidence limits. To distinguish them from studentized bootstrap limits, Hall described his reversed limits as hybrid, other authors describe them as Hall's percentile limits, or as basic bootstrap limits, or simply as bootstrap confidence limits. None of which is very useful in understanding which method has - or ought to have been used. Notice however, that where errors are proportional to the estimator's value, you can end up with simple percentile limits the right way round. Perhaps more importantly, whilst simple percentile and Hall's limits can have similar coverage, the latter can also enclose impossible values - such as proportions of below zero.

      • The simple percentile bootstrap gives correct intervals when the statistic is symmetrical and unbiased, it is transformation respecting, and does not suggest impossible parameter values.

      • Where there is no normalising transformation the simple and backwards intervals can seriously undercover when the statistic is skewed.

      • Studentized bootstrap intervals tend to be conservative - in other words 2-sided intervals are too wide.


    1. Bias correction

      Simple percentile limits assume that there is no consistent error in calculating estimates, and that any error is unrelated to the estimate's value - in other words that your statistic is unbiased and homoscedastic.

      For a uniformly biased standard normal estimator, with a bias of b, to correct the upper and lower bootstrap limits you merely subtract 2b. This simple correction cannot be applied if your bootstrap estimates do not have a standard normal distribution, because you are merely assuming the requisite normalising function is possible. However, provided that a normalising function is plausible, it is possible to work out what proportion of the bootstrap estimates correspond to b standard normal deviates - yielding bias corrected (BC) percentile confidence limits.

      Simple percentile and bias corrected bootstraps assume the sampling distribution is homoscedastic. Very often, for example with lognormal errors, the variance is proportional to the mean - in accelerated bias corrected (ABC) percentile confidence limits the 'acceleration' constant, a, is a parametric attempt to compensate for this fact. Ignoring their rather shaky theoretical foundations, the biggest problem with ABC limits is they require second stage sampling and some unwieldy calculations - whereas a number of alternate methods only require the latter.

      • Simple percentile, BC and ABC intervals all work by applying parametric assumptions to a nonparametric model, and have been criticised as such.
      • The need for ABC-type corrections can be avoided by applying a variance stabilising transformation - without worrying about whether it normalises your estimator.


    2. The problem of pivotalness

      Where the statistic of interest behaves as some kind of mean, and the sample size is very large, simple percentile 95% confidence limits do not differ very greatly from their theoretical value. For smaller samples, even where the bootstraps are normally distributed, simple percentile limits have a O[n-½] coverage error, equivalent to using a normal distribution to evaluate a t-distributed statistic. Provided your statistic is approximately normal (and unbiased) this coverage error can be reduced to O[n−1] by studentizing each bootstrap estimate.

      In a nonparametric setting the standard error of the ith bootstrap statistic is estimated from the standard deviation of, perhaps 50 or 100, second stage bootstrap estimates - obtained by resampling the ith bootstrap sample. Happily, when a bootstrap t statistic is assumed to be t-distributed, because it does not affect the statistics' relative rank, when the location of the confidence limits are found nonparametrically you do not need to estimate the number of degrees of freedom.

      However, although the studentized t bootstrap may be theoretically better, in practice it has several important problems.

      1. Confidence limits may enclose impossible parameter values.
      2. Unlucky selections of sample values can produce infinitely long, or zero length, intervals.
      3. When calculated from moderate samples of real data, on average, bootstrap t intervals are too wide - and conservative.
      4. For estimators that approach pivotalness very slowly, simple percentile limits can have a smaller coverage error.

      Notice also that, in many small-sample situations, stabilising the variance does not reduce the undercoverage of studentized intervals.


  • Alternative 2-sided intervals

    Despite their theoretical justification, studentized bootstrap estimators have a number of practical and theoretical problems - particularly when used for small samples where the estimates are skewed. One problem is, where interval lengths have a skewed distribution, the mean interval length is unrelated to coverage error. However in many situations, studentized bootstrap estimates have a conservative coverage and an unreasonably large mean interval. One way of addressing one or both of these problems is to modify the criteria by which critical points for confidence intervals are defined.

    • Equal tail intervals

      The usual way of selecting critical points for a 2-sided confidence interval (such as Efron's ) is for both tails to employ the same α/2 probability. In other words, P[L > Θ] = P[U < Θ] = α/2

    Hall has suggested two alternatives:

    1. Shortest intervals

      Instead of having α/2 in each tail, you select critical points to obtain the shortest interval for a combined tail-end probability of α. In other words, whilst P[L > Θ] + P[U < Θ] = α, you arrange the limits such that the interval (U − L) is as small as possible.

      In some situations this produces a similar 2-sided coverage error, but shorter intervals than the equal-tailed method. For a unimodal statistic the shortest interval is equivalent to a likelihood-based confidence interval. But, where the distribution is highly skewed, the shortest interval is a one-sided interval.

    2. Symmetrical intervals

      The critical values for these also have a combined tail-end probability of α, but their upper and lower confidence limits are the same length, so −L = U− = c. In other words, P[|−Θ| > c] = α.

      When applied to highly asymmetric distributions, symmetrical intervals may have both shorter length and smaller coverage error than equal intervals. Although these are much less computationally-intensive than studentized intervals, they are not very popular as yet. Moreover, if ' is pivotal and smooth, symmetric percentile-t bootstrap intervals may have a coverage error of as little as O[n−2]. Again, if the statistic is too skewed this interval either reduces to a 1-sided interval, or encloses impossible parameter values.

      If however, the estimators are distributed symmetrically, the methods listed above yield the same result. The graph set below compares the results of applying these three criteria to a moderately skewed distribution, where is unbiased.

{Fig. 2}
2tails.GIF from equal, short & symm.stg using 2tails.sta

      One danger in using these alternatives is your readers may assume they are estimators of the same interval, I. They can also give misleading results where the estimates are more skewed - or are strongly stepped.


  1. Parametric bootstrapping
    • Whereas nonparametric bootstraps make no assumptions about how your observations are distributed, and resample your original sample, parametric bootstraps resample a known distribution function, whose parameters are estimated from your sample.

    • These bootstrap estimates are either used to attach confidence limits nonparametrically - or a second parametric model is fitted using parameters estimated from the distribution of the bootstrap estimates, from which confidence limits are obtained analytically.

    The advantages and disadvantages of this approach, compared to nonparametric bootstrapping, can be summarised as follows.

    • In the nonparametric bootstrap, samples are drawn from a discrete set of n observations. This can be a serious disadvantage in small sample sizes because spurious fine structure in the original sample, but absent from the population sampled, may be faithfully reproduced in the simulated data.

      Another concern is that because small samples have only a few values, covering a restricted range, nonparametric bootstrap samples underestimate the amount of variation in the population you originally sampled. As a result, statisticians generally see samples of 10 or less as too small for reliable nonparametric bootstrapping.

      Small samples convey little reliable information about the higher moments of their population distribution function - in which case, a relatively simple function may be adequate.

    • Although parametric bootstrapping provides more power than the nonparametric bootstrap, it does so on the basis of an inherently arbitrary choice of model. Whilst the cumulative distribution of even quite small samples deviate little from that of their population, it can be far from easy to select the most appropriate mathematical function a priori.

      Maximum likelihood estimators are commonly used for parametric bootstrapping despite the fact that this criterion is nearly always based upon their large sample behaviour.

      Choosing an appropriate parametric error structure for a statistic based upon small samples can be awkward to justify. Bootstrap t statistics present an additional problem, partly because of problems in estimating standard errors analytically, partly because of difficulties in working out a suitable number of degrees of freedom for your pivot's (presumed, but often large-sample-based) distribution.

    So although parametric bootstrapping can be relatively straightforward to perform, and may be used to construct confidence intervals for the sample median of small samples, the bootstrap and estimator distribution functions are often very different. In addition, confidence limits may enclose invalid parameter values, and the coverage error is no better than nonparametric intervals.

    Confusingly, whilst the parametric bootstrap is sometimes described as a basic bootstrap, resampling residuals is sometimes referred to as being 'semi parametric' - which is also used to describe test-inversion and smoothed sample bootstraps. Resampling residuals is most popularly used to obtain bootstrap confidence intervals for regression coefficients, for example in nonparametric regression.


  2. Smoothed bootstrap intervals

    Nonparametric bootstrapping does not work well for discrete estimators such as the median. One way of addressing the problems of spurious fine structure and underestimated variation, that avoids fitting an arbitrarily-chosen distribution, is to smooth the sampling distribution resampled by bootstrapping. (For bootstrap t statistics the second-stage samples are smoothed, in the same way.) We explored Gaussian smoothing for displaying frequency distributions in Unit 3, but let us summarise the key issues:

    Gaussian smoothing using mean probability densities is equivalent to 'jittering' a set of observations to avoid ties so, in a bootstrap setting, a small random error is added to each observation upon resampling.

    A normal distribution is among the most popular smoothing error-function - and is referred to as a Gaussian Kernel. The difference between Gaussian smoothing and fitting a Gaussian distribution is, like a moving average, the error function is centred upon each individual value of Y (so the mean error is zero). Again like a moving average, the critical parameter determining the degree of smoothing is the window size - or bandwidth. In the case of a Gaussian smoothing function, the bandwidth is the standard deviation of the smoothing error function.

    Although other kernels are available, such as the Epanechnikov, for most applications the choice of bandwidth is more important. If a Gaussian kernel is used a 'plug in' estimate of the smoothing parameter, h, can be estimated from each bootstrap sample - where h = 1.06 σ n−1/5. The graph below shows the results of applying Gaussian smoothing to (n=) 10 observations, using 3 different values of h.

{Fig. 3}
from smooth1 & 2.stg using smooth1.sta

    Gaussian smoothing obviously works best for unimodal normal distributions. For heavily skewed, or extreme kurtotic distributions, this plug-in estimate oversmoothes - yielding an overly symmetrical, unimodal, distribution. Conversely, if h is very small the distribution is undersmoothed, and each observation is replaced by its own private normal mode.

    Smoothed bootstrapping has 3 important effects

    • If an appropriate smoothing function is chosen, coverage error can be reduced up to one order.

    • Unlike a moving average, a Gaussian smoothing function extends, rather than truncates the tails.

    • Smoothing increases the variance of bootstrap statistics - but this can be compensated for by using shortest confidence intervals.

    One disadvantage of this type of smoothing is it can produce impossible values, such as those falling to the left of the Y-axis above. Also, aside from the problem of choosing the best smoothing kernel, most bandwidth formulae use the sample variance or bootstrap-sample variance.


  1. Test-inversion intervals

    In principle, these can be parametric, nonparametric, or semiparametric - depending upon how you estimate the distribution of values to be bootstrapped and the distribution of statistics.

    Test inversion limits exploit the fundamental relationship between tests and confidence limits, and can be used to construct P−value plots, or for estimating the power of tests.

      Remember that, unlike an ordinary null hypothesis test, these tests do not assume the nil hypothesis is true, but a range of null hypothesees are tested - set by the possible values of Θ.

    Test inversion intervals are sometimes described as semiparametric because it is not the original observations that are resampled, but a modified (shifted) sample. Confidence limits by permutation employ the same principle, but sample without replacement.

    Unlike conventional 2-sided confidence limits, test inversion does not assume the statistic is distributed symmetrically or smoothly. Unfortunately, owing to the additional computation involved, and the need for interpolation or an efficient search algorithm, this approach is not often used to construct intervals.

    Theoretical work shows that test inversion limits have similar coverage errors to the other procedures described above. However where sample size renders the statistic non-pivotal, conventional analytical models based upon the Edgeworth expansion are unreliable, and the only way to compare techniques is by simulation - but, because test inversion limits are not very popular, their properties are relatively unexplored.

    The advantage of test inversion confidence limits is they are easy-to-use and accurate, do not require the calculation of analytic correction terms, can be calculated for studentized estimates, do not behave irregularly for small values of alpha, and do not require a variance-stabilising transformation.