 
In principle there are three different ways of obtaining and evaluating bootstrap estimates: nonparametric, parametric, and semiparametric. In practice, because nonparametric intervals make parametric assumptions, this division is rather arbitrary. Whilst these terms may provide some insight, they are a not very useful classification. Nevertheless, since 'nonparametric' intervals are so popular let us consider them first.
 Nonparametric bootstrap confidence intervals
 Equal tail confidence limits
Efron's Simple Percentile Confidence Intervals, although arithmetically straightforward and relatively simple to interpret, are surprisingly controversial. To understand the strengths, limitations, and extensions of the commonorgarden nonparametric bootstrap we should first summarize the basic reasoning.
 For simplicity, assume you have a set of (n) observations from which you have calculated some statistic (), for which you have no formula to estimate a standard error, but to which you wish to attach (ordinary 2tailed 95%) confidence limits.
 You cannot reasonably assume your sample (or your sample statistic) represents a known frequency distribution, but can assume it adequately reflects the wider population from which it was drawn.
 Using your sample as a model of it's population, you take B samples of n observations (with replacement), from which you calculate B (plugin) bootstrap estimates of your sample statistic.  This is sometimes referred to as a basic bootstrap.
 Assuming these bootstrap statistics vary in a similar fashion to your sample statistic, when similarly obtained, then the most typical 95% of those bootstrap statistics would have 95% confidence limits which enclose the population parameter, Θ  of which your sample statistic is the best available estimate.  Conversely, 95% of your bootstrap estimates should fall within the 95% confidence limits about your sample statistic.
As a roughandready first approximation this interval estimate might seem acceptable enough, unfortunately it conceals a number of crucial assumptions  not all of which are reasonable.
 Ordinary 2sided confidence limits assume your statistic is distributed smoothly and symmetrically.
 They also assume the statistic is unbiased and homoscedastic.
 Because normality is an asymptotic quality, intervals calculated from finite samples are too narrow  and consistently undercover.
Since these assumptions raise a number of issues, let us consider them in more detail.
 The unknown normalizing function
Because they were designed for a parametric normal universe, ordinary 2sided 95% confidence limits assume your sample statistics are distributed normally  or that a normalising transformation has been applied. The normalising transformation can either be applied to the observations or to the statistic itself  in which case f[^{*}] is normal  and f is the normalizing function. Critically, it is assumed that f (whatever it might be) does not alter the rank of your estimates. Confidence intervals which are estimated under those assumptions are described as being transformation respecting.
For instance, the graph set below shows the cumulative distributions of ( B = ) 100 bootstrap means, with and without an appropriate normalizing transformation. The observed estimate ( ) is tinted violet, but the highest and lowest 5% of these estimates are orange. The grey rectangle encloses the central 90% of bootstrap estimates, shown in green  the estimated 90% nonparametric confidence interval. Curves tinted blue are cumulative normal distributions, fitted to our transformed and untransformed bootstrap estimates.
{Fig. 1}

Controversially, you do not have to actually find the appropriate normalizing transformation to obtain these confidence intervals  provided it can be argued that a suitable rescaling function might plausibly exist. All too often it is simply not considered  but in a number of situations, such as where the statistic's distribution is highly skewed or discrete, this assumption is clearly violated. This causes problems for users of various maximum likelihood estimators (including Lestimators), means of small samples of heavily tied populations, proportions of small samples, and quantiles  including the median and maximum. In practice therefore, if you can find a normalising transformation, it is best to use it.
A less obvious consequence of this line of reasoning is revealed when we evaluate confidence limits by test inversion. In which situation it turns out that, if the other assumption for simple percentile intervals are met (particularly homoscedasticity) any skew in the statistic produces the opposite result upon the confidence limits. In consequence Hall described simple percentile limits as 'backwards', and suggested they ought to be reversed. Subsequently he combined this point with our earlier one regarding coverage  producing this memorable quote:
"Using the percentile method critical point amounts to looking up the wrong [statistical] tables backwards."
P. Hall (1988) Theoretical comparison of bootstrap confidence intervals, Annals of Statistics 16, 927953 
In consequence, simple percentile bootstrap confidence limits are also described as backwards limits, nonparametric bootstrap limits, basic bootstrap limits, or all too often, just as bootstrap confidence limits. To distinguish them from studentized bootstrap limits, Hall described his reversed limits as hybrid, other authors describe them as Hall's percentile limits, or as basic bootstrap limits, or simply as bootstrap confidence limits. None of which is very useful in understanding which method has  or ought to have been used. Notice however, that where errors are proportional to the estimator's value, you can end up with simple percentile limits the right way round. Perhaps more importantly, whilst simple percentile and Hall's limits can have similar coverage, the latter can also enclose impossible values  such as proportions of below zero.
 The simple percentile bootstrap gives correct intervals when the statistic is symmetrical and unbiased, it is transformation respecting, and does not suggest impossible parameter values.
 Where there is no normalising transformation the simple and backwards intervals can seriously undercover when the statistic is skewed.
 Studentized bootstrap intervals tend to be conservative  in other words 2sided intervals are too wide.

 Bias correction
Simple percentile limits assume that there is no consistent error in calculating estimates, and that any error is unrelated to the estimate's value  in other words that your statistic is unbiased and homoscedastic.
For a uniformly biased standard normal estimator, with a bias of b, to correct the upper and lower bootstrap limits you merely subtract 2b. This simple correction cannot be applied if your bootstrap estimates do not have a standard normal distribution, because you are merely assuming the requisite normalising function is possible. However, provided that a normalising function is plausible, it is possible to work out what proportion of the bootstrap estimates correspond to b standard normal deviates  yielding bias corrected (BC) percentile confidence limits.
Simple percentile and bias corrected bootstraps assume the sampling distribution is homoscedastic. Very often, for example with lognormal errors, the variance is proportional to the mean  in accelerated bias corrected (ABC) percentile confidence limits the 'acceleration' constant, a, is a parametric attempt to compensate for this fact. Ignoring their rather shaky theoretical foundations, the biggest problem with ABC limits is they require second stage sampling and some unwieldy calculations  whereas a number of alternate methods only require the latter.
 Simple percentile, BC and ABC intervals all work by applying parametric assumptions to a nonparametric model, and have been criticised as such.
 The need for ABCtype corrections can be avoided by applying a variance stabilising transformation  without worrying about whether it normalises your estimator.

 The problem of pivotalness
Where the statistic of interest behaves as some kind of mean, and the sample size is very large, simple percentile 95% confidence limits do not differ very greatly from their theoretical value. For smaller samples, even where the bootstraps are normally distributed, simple percentile limits have a O[n^{½}] coverage error, equivalent to using a normal distribution to evaluate a tdistributed statistic. Provided your statistic is approximately normal (and unbiased) this coverage error can be reduced to O[n^{−1}] by studentizing each bootstrap estimate.
In a nonparametric setting the standard error of the ith bootstrap statistic is estimated from the standard deviation of, perhaps 50 or 100, second stage bootstrap estimates  obtained by resampling the ith bootstrap sample. Happily, when a bootstrap t statistic is assumed to be tdistributed, because it does not affect the statistics' relative rank, when the location of the confidence limits are found nonparametrically you do not need to estimate the number of degrees of freedom.
However, although the studentized t bootstrap may be theoretically better, in practice it has several important problems.
 Confidence limits may enclose impossible parameter values.
 Unlucky selections of sample values can produce infinitely long, or zero length, intervals.
 When calculated from moderate samples of real data, on average, bootstrap t intervals are too wide  and conservative.
 For estimators that approach pivotalness very slowly, simple percentile limits can have a smaller coverage error.
Notice also that, in many smallsample situations, stabilising the variance does not reduce the undercoverage of studentized intervals.
 Alternative 2sided intervals
Despite their theoretical justification, studentized bootstrap estimators have a number of practical and theoretical problems  particularly when used for small samples where the estimates are skewed. One problem is, where interval lengths have a skewed distribution, the mean interval length is unrelated to coverage error. However in many situations, studentized bootstrap estimates have a conservative coverage and an unreasonably large mean interval. One way of addressing one or both of these problems is to modify the criteria by which critical points for confidence intervals are defined.
Hall has suggested two alternatives:
 Shortest intervals
Instead of having α/2 in each tail, you select critical points to obtain the shortest interval for a combined tailend probability of α. In other words, whilst P[L > Θ] + P[U < Θ] = α, you arrange the limits such that the interval (U − L) is as small as possible.
In some situations this produces a similar 2sided coverage error, but shorter intervals than the equaltailed method. For a unimodal statistic the shortest interval is equivalent to a likelihoodbased confidence interval. But, where the distribution is highly skewed, the shortest interval is a onesided interval.
 Symmetrical intervals
The critical values for these also have a combined tailend probability of α, but their upper and lower confidence limits are the same length, so −L = U− = c. In other words, P[−Θ > c] = α.
When applied to highly asymmetric distributions, symmetrical intervals may have both shorter length and smaller coverage error than equal intervals. Although these are much less computationallyintensive than studentized intervals, they are not very popular as yet. Moreover, if ' is pivotal and smooth, symmetric percentilet bootstrap intervals may have a coverage error of as little as O[n^{−2}]. Again, if the statistic is too skewed this interval either reduces to a 1sided interval, or encloses impossible parameter values.
If however, the estimators are distributed symmetrically, the methods listed above yield the same result. The graph set below compares the results of applying these three criteria to a moderately skewed distribution, where is unbiased.
{Fig. 2}

One danger in using these alternatives is your readers may assume they are estimators of the same interval, I. They can also give misleading results where the estimates are more skewed  or are strongly stepped.
 Parametric bootstrapping
 Whereas nonparametric bootstraps make no assumptions about how your observations are distributed, and resample your original sample, parametric bootstraps resample a known distribution function, whose parameters are estimated from your sample.
 These bootstrap estimates are either used to attach confidence limits nonparametrically  or a second parametric model is fitted using parameters estimated from the distribution of the bootstrap estimates, from which confidence limits are obtained analytically.
The advantages and disadvantages of this approach, compared to nonparametric bootstrapping, can be summarised as follows.
 In the nonparametric bootstrap, samples are drawn from a discrete set of n observations. This can be a serious disadvantage in small sample sizes because spurious fine structure in the original sample, but absent from the population sampled, may be faithfully reproduced in the simulated data.
Another concern is that because small samples have only a few values, covering a restricted range, nonparametric bootstrap samples underestimate the amount of variation in the population you originally sampled. As a result, statisticians generally see samples of 10 or less as too small for reliable nonparametric bootstrapping.
Small samples convey little reliable information about the higher moments of their population distribution function  in which case, a relatively simple function may be adequate.
 Although parametric bootstrapping provides more power than the nonparametric bootstrap, it does so on the basis of an inherently arbitrary choice of model. Whilst the cumulative distribution of even quite small samples deviate little from that of their population, it can be far from easy to select the most appropriate mathematical function a priori.
Maximum likelihood estimators are commonly used for parametric bootstrapping despite the fact that this criterion is nearly always based upon their large sample behaviour.
Choosing an appropriate parametric error structure for a statistic based upon small samples can be awkward to justify. Bootstrap t statistics present an additional problem, partly because of problems in estimating standard errors analytically, partly because of difficulties in working out a suitable number of degrees of freedom for your pivot's (presumed, but often largesamplebased) distribution.
So although parametric bootstrapping can be relatively straightforward to perform, and may be used to construct confidence intervals for the sample median of small samples, the bootstrap and estimator distribution functions are often very different. In addition, confidence limits may enclose invalid parameter values, and the coverage error is no better than nonparametric intervals.
Confusingly, whilst the parametric bootstrap is sometimes described as a basic bootstrap, resampling residuals is sometimes referred to as being 'semi parametric'  which is also used to describe testinversion and smoothed sample bootstraps. Resampling residuals is most popularly used to obtain bootstrap confidence intervals for regression coefficients, for example in nonparametric regression.
 Smoothed bootstrap intervals
Nonparametric bootstrapping does not work well for discrete estimators such as the median. One way of addressing the problems of spurious fine structure and underestimated variation, that avoids fitting an arbitrarilychosen distribution, is to smooth the sampling distribution resampled by bootstrapping. (For bootstrap t statistics the secondstage samples are smoothed, in the same way.) We explored Gaussian smoothing for displaying frequency distributions in Unit 3, but let us summarise the key issues:
Gaussian smoothing using mean probability densities is equivalent to 'jittering' a set of observations to avoid ties so, in a bootstrap setting, a small random error is added to each observation upon resampling.
A normal distribution is among the most popular smoothing errorfunction  and is referred to as a Gaussian Kernel. The difference between Gaussian smoothing and fitting a Gaussian distribution is, like a moving average, the error function is centred upon each individual value of Y (so the mean error is zero). Again like a moving average, the critical parameter determining the degree of smoothing is the window size  or bandwidth. In the case of a Gaussian smoothing function, the bandwidth is the standard deviation of the smoothing error function.
Although other kernels are available, such as the Epanechnikov, for most applications the choice of bandwidth is more important. If a Gaussian kernel is used a 'plug in' estimate of the smoothing parameter, h, can be estimated from each bootstrap sample  where h = 1.06 σ n^{−1/5}. The graph below shows the results of applying Gaussian smoothing to (n=) 10 observations, using 3 different values of h.
{Fig. 3}

Gaussian smoothing obviously works best for unimodal normal distributions. For heavily skewed, or extreme kurtotic distributions, this plugin estimate oversmoothes  yielding an overly symmetrical, unimodal, distribution. Conversely, if h is very small the distribution is undersmoothed, and each observation is replaced by its own private normal mode.
Smoothed bootstrapping has 3 important effects
 If an appropriate smoothing function is chosen, coverage error can be reduced up to one order.
 Unlike a moving average, a Gaussian smoothing function extends, rather than truncates the tails.
 Smoothing increases the variance of bootstrap statistics  but this can be compensated for by using shortest confidence intervals.

One disadvantage of this type of smoothing is it can produce impossible values, such as those falling to the left of the Yaxis above. Also, aside from the problem of choosing the best smoothing kernel, most bandwidth formulae use the sample variance or bootstrapsample variance.
 Testinversion intervals
In principle, these can be parametric, nonparametric, or semiparametric  depending upon how you estimate the distribution of values to be bootstrapped and the distribution of statistics.
Test inversion limits exploit the fundamental relationship between tests and confidence limits, and can be used to construct P−value plots, or for estimating the power of tests.
Remember that, unlike an ordinary null hypothesis test, these tests do not assume the nil hypothesis is true, but a range of null hypothesees are tested  set by the possible values of Θ.
Test inversion intervals are sometimes described as semiparametric because it is not the original observations that are resampled, but a modified (shifted) sample. Confidence limits by permutation employ the same principle, but sample without replacement.
Unlike conventional 2sided confidence limits, test inversion does not assume the statistic is distributed symmetrically or smoothly. Unfortunately, owing to the additional computation involved, and the need for interpolation or an efficient search algorithm, this approach is not often used to construct intervals.
Theoretical work shows that test inversion limits have similar coverage errors to the other procedures described above. However where sample size renders the statistic nonpivotal, conventional analytical models based upon the Edgeworth expansion
are unreliable, and the only way to compare techniques is by simulation  but, because test inversion limits are not very popular, their properties are relatively unexplored.
The advantage of test inversion confidence limits is they are easytouse and accurate, do not require the calculation of analytic correction terms, can be calculated for studentized estimates, do not behave irregularly for small values of alpha, and do not require a variancestabilising transformation.
