Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
How reliable are your estimates?On this page: Interval estimates Coverage error Point estimates Efficiency Relative efficiency Robustness Other measures Distribution estimates Approximation error Taylor's expansion Edgeworth's expansion
Given the importance of this question, it is not surprising there are a number of ways of answering it - not least because statisticians have devised so many different types of estimator, and almost as many ways of classifying them.
An estimator is simply a method by which you obtain an estimate, bearing in mind that there may be any number of methods available.
Since confidence intervals are a popular way to indicate the reliability of a point estimate, and interval estimates have problems peculiar to themselves, let us consider them first.
↔ Interval estimates
Interval estimates, whilst sharing many of the problems of point estimates, tend to be assessed rather differently. To understand the reasoning and shortcomings of these methods, we must consider how these intervals, and their estimates, are defined.
In essence, a confidence interval
This arrangement has two important properties:
From which it follows that, when
Assuming your confidence intervals are good estimates of I, when these estimated intervals
The most popular measure of the quality of an interval estimator, known as the coverage error, is simply the difference between the observed and expected coverage. Confusingly, for reasons of mathematical convenience, the formulae for this generally assume that is distributed symmetrically, and you are calculating the (equivalent 2-tailed) interval between two equal 1-tailed confidence limits. - In other words, coverage error assumes a different definition of confidence limits from the one
The problem with this measure is it wholly ignores the length of confidence intervals, or what happens where is not distributed symmetrically about Θ. Interest in alternate measures of interval estimates and alternate ways of constructing confidence limits is comparatively recent.
← Point estimates
Until quite recently, the two most common criteria for judging the reliability of a point estimator were:
a) Measures of bias
Statisticians use two measures of bias: mean bias and median bias - of which the first is the more popular. Mean bias is simply the average deviation you would obtain if you used the same estimator upon a large number (R) of identical random samples from the same population.
For R samples the average estimate,
For estimates that are distributed symmetrically about their mean, E, there is no difference between mean and median bias. So
'Unbiased additive linear' estimators, such as the mean, have no mean bias but, where their estimates have a skewed distribution, these estimators are median biased. Other estimators, such as the plug-in 'population variance'
For some years most statisticians assumed the best estimators had to be unbiased (when applied to finite samples). Fairly recently, however, it has been generally accepted that bias is only a serious problem when its extent is unknown. In contrast, the more variable an estimate is - in effect - the less useful information it contains.
b) Measures of concentration
Whilst an ordinary sample mean is an unbiased estimate of its population mean, this does not imply this plug-in estimator is the best estimator of that parameter. For purposes of inference, the least variable, most efficient estimator, provides the most power to discredit a null hypothesis.
To parametric statisticians at least, the most obvious measure of your estimate variation is its variance. In other words, if is the average of R estimates of a parameter, Θ, then you might calculate their variance
Because they allowed most power, the least variable estimates became known as the most efficient, and estimators were compared upon that
For unbiased estimators, such as the sample mean, efficiency is a perfectly adequate measure of reliability because the expected value is the same as the parametric value, and
Maximum likelihood estimates often have the smallest variance and are sometimes biased. For example, where the population is normal, the arithmetic mean is the same as the maximum likelihood estimate, and is both unbiased and has the smallest variance - whereas the plug-in estimator of the population variance is biased, but has the minimum variance and is equivalent to the maximum likelihood estimate. Where the population distribution is undefined the bias of the most efficient estimator is unknown, and a maximum likelihood estimate cannot be obtained.
The fact that the relative efficiency of two statistics could be defined as the ratio of their mean squared errors led to the idea of minimum variance estimators and best equivalent estimators. For samples of some types of population, particularly normal ones, it has long been known that some estimators are the most efficient possible - and these were described as sufficient estimators. If you compare a mean with a trimmed mean, such as a median (which is the most heavily trimmed mean), the underlying reason is clear. A sufficient estimator summarizes all the useful information contained within your sample - the more information there is available, the less variable are its estimates. Minimax estimators, on the other hand, minimise the greatest error in estimating the parameter - but, because this is at the expense of their power, minimax estimators are too pessimistic (underpowered) for many applications.
Where its assumptions are fully met the arithmetic mean, , ΣY/n, is the best possible estimator of its parameter by pretty much any criteria you wish to
Recall however that accuracy does not imply precision - if your sample is of a long-tailed distribution the precision of means and regression slopes can be reduced to the point of uselessness. Since a single highly-aberrant value can radically affect the value of it is said to have a zero breakdown, whereas a median (the 50% trimmed mean) has the highest breakdown (=0.5). Alternate approaches to this problem are to find which location gives the minimum sum of the trimmed squared-errors - or use the median absolute-error.
Similar concerns apply to popular measures of dispersion - such as the population variance,
Since these sort of problems are exactly what you would expect from measurement errors and 'outliers', there has been a growing interest in developing more robust estimators. A robust estimator being one which performs well, both under ideal circumstances, and where its underlying assumptions are not fully met. As per usual, there are a number of conflicting measures of robustness, and quite a few ways of classifying robust estimators.
d) Other measures
Parametric statistical models make use of the fact that, as sample size increases, quite a few estimators become more regular in their habits - and approach, but not reach, known distributions. Asymptotic regularity is common where samples approach infinite size, but for some estimators 'large sample approximations' can require extremely large samples indeed. Given the quantity of study which has been invested, a variety of ways have been found to quantify how well behaved these estimators are - at least in principle.
For example, a consistent estimator is an estimator whose bias and variance both approach zero as the sample size approaches infinity. However some estimators (such as the mean), converge more rapidly than others (such as the median). Other estimators converge to their large sample behaviour only very slowly.
Estimators whose relative efficiency is unrelated to the parameter being estimated are described as being regular - and are obviously desirable. Under this definition a sample proportion is not a regular estimator. A few non-regular estimators favour particular values for their estimates, which can be irritating. In contrast superefficient estimators are unbeatable in one situation, but unreliable otherwise.
~ Distribution estimates
In principle at least, the 'distribution function' is a statistic that conveys most information about a population of observations - or about a population of summary statistics.
For example, likelihood statistics are often assumed to be asymptotically normal, so the distribution of their ratios (such as the G-statistics described in
Only knowing their limit distribution can also make it rather difficult to select an optimal maximum likelihood estimator!
≅ Approximation error
Using some arbitrary but convenient theoretical frequency distribution as an approximation of the actual distribution of your estimates introduces what is known as an approximation error. Of course, in terms of its effect upon the location of confidence limits and testing point estimators, it enables you to quantify a bias.
We noted above that, although the small-sample distributions of many estimators are complex, they often converge asymptotically to 'known' distributions - particularly the normal one. Rescaling-transformations and studentizing can reduce, but do not eliminate, approximation error.
Of the various ways approximation error can be expressed, the absolute difference is perhaps the most
In this application, because the left-most terms in the series are generally the largest and most easily estimated, all the smaller 'trailing' terms are usually combined into a 'remainder' term, Ri - where i is the number of (left-hand) terms it excludes - usually between 2 and 4. This remainder term is useful because it indicates how well the leading terms, by themselves, approximate the expression as a whole - assuming the series is convergent (in other words, higher terms are progressively
Taylor's expansion is quite useful for estimating moments, or the probability of observing a value
Another way to quantify approximation
Nevertheless, if we ignore the mathematical detail, most of the implications and assumptions are understandable enough.
For example, ignoring how each constant is calculated, the formula below assumes that the studentized estimator behaves as sum (or mean) and has a continuous distribution - and that the moments of that distribution are
Since its reasoning derives from the behaviour of large (or extremely large) samples, a popular way to express the magnitude of this remainder (the estimated approximation error) is to use an asymptotic order term such as the 'big O' order term introduced in used in
Although this approach assumes you know how your statistical function is distributed, it does help theoreticians to improve large-sample approximations for moderate samples - and some transformations now make use of it although, to avoid 'over correction', they seldom use more than the first three terms. More immediately, order terms are increasingly being used to indicate both the degree and type of approximation error.
Notice however that, because of the way in which the moments get 'smeared out', the first term of the expansion contains most of the effects of skew, whereas the second term allows for most of the effects of kurtosis and the secondary effects of skewness.