Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
Principles and properties of bootstrap estimatorsOn this page: Principles - why do it How many bootstrap-statistics are needed Bootstrap estimates of bias & standard error Bootstrap Standard Error Bootstrap t statistics
Bootstrapping (sampling with replacement) is an increasingly popular way of obtaining confidence intervals for otherwise intractable statistics - and is sometimes used to estimate parameters and for significance tests. Unlike
Using (analytical) parametric methods it is difficult to estimate confidence intervals for growth rates, hazard functions, genetic distances, heritability indices, niche overlap indices, the Gini coefficient (of inequality of plant sizes), species richness indices (such as the Jaccard index of similarity of species composition), population estimates using MRR, dose response estimates, population density from line transects, catch per unit effort (eg in fisheries), time interval estimators (such as tumour onset following exposure or post treatment). Unfortunately, jackknifing can run into serious difficulties for non-normal estimators such as variance ratios, correlation coefficients, and X-squared error probabilities.
Bootstrapping is particularly useful where the statistic's distribution depends upon the population being sampled, or where the statistic's bias varies with sample size - but there are a few statistics for which bootstrapping cannot estimate confidence intervals.
Although there has been a great deal of theoretical and practical research on bootstrapping, among biologists (although the method used is seldom specified) the most popular interval estimator is Efron's simple, percentile, non-parametric (backwards)
It may surprise you to learn that, although bootstrapping was devised by Efron in 1979, it is still somewhat controversial. One reason for this is that, because the amount of computation is comparatively large, bootstrap estimators are much more recent than the usual 'textbook' statistics (and jackknifing, for that matter). Fortunately, although some statisticians complained that bootstrapping was just a way of getting a 'free lunch', it is now generally accepted that bootstrapping is perfectly valid upon theoretical grounds - provided its assumptions are reasonable. Unfortunately, an awful lot of biologists either believe that (being nonparametric) bootstrap estimators do not make any assumptions, or simply do not bother to verify the assumptions are met.
So, before we go much further, we would be wise to understand the limitations inherent to bootstrapping in general, and simple nonparametric confidence limits in particular. To expose the assumptions, and some of their implications, let us consider four simple questions.
For bootstrapping to estimate how your sample statistic varies, the first and most important assumption is that your sample of observations
Provided you can assume that your sample provides a reasonable representation of the population it was drawn from, it is similarly reasonable to use one as a model of the other. Each bootstrap sample
A permutation test, in contrast, makes no assumptions regarding how your observations were obtained - and, because your inference is confined to the result of assigning those values, they are selected without replacement.
More formally, given that the observed cumulative distribution of your sample is an estimate of the cumulative distribution of its population, the cumulative distribution of bootstrap estimators calculated from that sample can be expected to reflect the (true, population, or parametric) cumulative distribution of your estimator.
Random sampling and complex models aside, the most obvious limitation of this approach is the size of your original sample.
Although the cumulative distribution of your sample may be roughly similar to its population - on average at least, this is almost never true of the frequency distribution. Even if your population has a continuous normal distribution function, you sample will always have a discrete distribution - resampling a sample therefore produces a number of irregular, randomly positioned, steps in the bootstrap statistic's cumulative distribution. This spurious 'fine structure' not only limits the detail in your model, it also is a source of annoying artefacts and 'crazy' results - all of which limits the power (and reliability) of your inference. This loss of detail, and tendency to artefacts bedevils a number of the more sophisticated bootstrapping techniques, especially where they are applied to 'residuals' - in other words to the deviations of observations from some model's predictions.
Whereas bootstrapping is most useful for moderately large samples, even in theory, there are definite limits upon what can be expected - if for no other reason than the fact that the most extreme results in any population tend to be rarest, and hence least liable to be represented in a sample. One consequence of this is the amount of variation tends to be underestimated, and extreme quantiles are both variable and biased. Correcting the standard error is relatively simple if your statistic is a mean (or behaves as if it is), but for less tractable estimators the approximation error may be reduced by calculating a studentized-bootstrap estimator (also known as a bootstrap-t statistic), for which the standard error of your statistic is estimated by a second stage of resampling - in other words, by repeatedly resampling each bootstrap sample.
Working out the result of every possible sample of n observations entails rather a lot of computation - because there are
In either case, if you ignore the order in which they are selected, there are rather fewer possible
In practice there is little to be gained from calculating the entire bootstrap distribution. By obtaining
The results of such mathematics are instructive. As you might expect, if you resample a set of observations, the most likely combination of observations is identical to the sample being resampled. However, if your sample is of an untied continuous variable, if you obtain
In principle therefore, for all practical purposes, duplication of effort is not considered a problem. A further consequence is that the distribution of linear additive statistics (such as the mean) is often regarded as being continuous - or smooth. At the same time, because you are re-sampling a discrete distribution of observations, this assumption is unreasonable for small samples - nor does it imply the distribution of your bootstrap estimators is unbounded, and this limits the width of estimator's tail end distribution. Indeed, these bounds are all too easy to calculate by selecting n of the smallest, and n of the largest value in your sample - then applying your estimator to them. In addition, if your sample is heavily tied (such as binary data), or your estimating function truncates or pools values (such as where it is calculates a median or maximum), the distribution of bootstrap statistics will be a discrete, unsmooth function of your sample - which upsets this simple model.
These points aside, the number of bootstraps you need to perform depends upon whether you are trying to estimate moments or tails. In principle at least, you would expect means and standard deviations to require fewer bootstrap samples than skews or tail-end proportions. For a simple two-tailed 95% percentile confidence interval, a minimum of
To the extent that your sample represents its population, you would expect that bootstrap statistics ought to reflect the bias and standard error of estimates calculated from random samples of that entire
If your sample statistic is unbiased, and has no inherent a priori value, the best estimate of it parameter is usually its observed value, . If its bootstrap estimators are distributed symmetrically, their mean is the best estimate - but, for skewed distributions, their median may be a better measure. For simplicity, let us say the
Notice however, that if you are trying to correct confidence limits, this estimate of bias can only be applied directly when your statistic is distributed symmetrically. In addition, whilst the standard deviation of bootstrap estimates generally provides a rather better estimate than one calculated by jackknifing, bootstrapping is not always such a reliable estimate of bias. One reason for these problems is the fact that the random resampling process introduces its own quota of variation, because the (combined set of) observations in your B bootstrap samples are unlikely to have the same distribution as the sample you are resampling. A simple way of improving the reliability of these estimates is to perform what is known as 'balanced' bootstrapping.
As a first approximation, the standard deviation of your bootstrap statistics
A disadvantage of this method is that, unless your sample comprises the entire population of observations, the standard deviation of * is liable to underestimate the dispersion of .
For example we took 10000 samples, each of (n=) 10 observations, from a standard normal population
To obtain a bootstrap estimate of this standard deviation we resampled each of these 10000 samples (with replacement). In other words we took 100 bootstrap samples, of 10 observations, from each sample of 10 observations - and calculated their means. Then, using the population variance formula as a plug-in estimator, we calculated the standard deviation of each set of 100 bootstrap means - giving us 10000 standard deviations in all.
As expected, both the average standard deviation (0.29045) and the median (0.28620) standard deviation of these bootstrapped means was rather lower than the standard deviation of our sample means (0.31673).
For a few statistics, such as the sample mean (ΣY/n), it is possible to estimate the standard error of from your sample - very often it is not. However, whilst simple bootstrap standard error provides a first order estimate of your estimate's dispersion, bootstrapping therefore tends to produce confidence limits that are slightly narrower than they ought to be. This constraint is no different for nonparametric confidence intervals - even when obtained by test-inversion.
To reduce approximation error, second-stage bootstrap resampling may be used to calculate the distribution, or standard error, of t−statistics and other 'studentized' estimators.
Bootstrapping makes no assumptions as to how your observations are distributed and, allowing for the limitations noted above, bootstrap estimates are assumed to be distributed in the same way as the estimates themselves. Where your statistic has an asymptotic normal distribution, the bootstrap statistic is assumed to do the same thing - at least as a first approximation. Given which, 1.96 × the standard error (estimated as above) enables you to attach 95% confidence limits to a number of (otherwise rather awkward) statistics.
The problem with this attractively simple approach is that, because sampling looses information, you must expect bootstrapping to underestimate the standard deviation of your sample statistic
For bootstrapping to exceed the accuracy of the normal approximation you have to use pivotal estimates (pivots) - in other words statistics which have the same distribution, regardless of their population parameters. For example the studentized sample mean,
Assuming your statistic is unbiased, there are three problems with this approach.
Provided you can reasonably assume your resampling process mimics the original sampling process, you can simulate the variation in standard error by resampling each of your B bootstrap resamples - and use the standard deviation of each set of second-stage bootstrap statistics
For example - using our earlier
How you go about obtaining these second-stage estimates (or the degrees of freedom for t) depends upon whether you are using a nonparametric or parametric bootstrap model.
In principle, for smoothly-distributed estimators that obey central limit theorem, bootstrap t statistics enable coverage error to be reduced from In practice, bootstrap pivotal statistics have three other disadvantages. The net result is that although there are quite a number of more-or-less sophisticated bootstrap methods available, because Efron's simple percentile confidence limits are easy to compute and interpret - and, when used intelligently, relatively robust - they are surprisingly popular.
In practice, bootstrap pivotal statistics have three other disadvantages.
The net result is that although there are quite a number of more-or-less sophisticated bootstrap methods available, because Efron's simple percentile confidence limits are easy to compute and interpret - and, when used intelligently, relatively robust - they are surprisingly popular.