Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

Search this site



Plug-in estimators

In the most general sense, an 'estimator' is any procedure or formula which produces an estimate. But, to be of use, an estimate should be based upon information rather than uninformed speculation. Moreover, a good estimate should allow for how the data were obtained in the first place - whereas, in practice, that component is simply assumed, often quite unjustifiably.

  • An estimator is any statistic which uses a 'sample' of observations to give an estimate of a population parameter.
    • A statistic is simply a number we use to summarise, or represent, a set of observations - or the function / procedure which produces that number.
    • If that 'set' is an entire population of observations, the statistic may be described as a parameter.
    • A parameter can be any function of an entire population of observations, or a combination of those functions.
  • A plug-in estimator uses the same formula as the population statistic it is estimating.

For instance, if you like to think in statistical formulae:

  • The sample maximum is a plug in estimate of its population maximum.
  • An arithmetic mean ( = Σy/n) provides a plug in estimate of its population mean (μ = Σy/N) provided n is the number of values in your sample and N is the number of values in the population.
      Or, if N is infinite, Σy/n is a plug in estimate of the expected value of y, μ = E(y).
  • For binary data, the proportion (p) of positive results in a sample is a plug in estimator of the proportion (P) of positive results in its population.
  • Again, the 'population variance' formula s2 = Σ(y-)2/n is a plug in estimate of its population variance, σ2 = Σ(y-μ)2/n or E({y-μ}2)

  • The 'sample variance' formula s2 = Σ(y-)2/(n−1) is not a plug in estimate of σ2 = E({y-μ}2) because those two variances use different formulae.
  • Confusingly perhaps, the sample median, or any other trimmed mean, cannot be a plug-in estimate for the population mean μ = E(y) because they are calculated using different formulae.
      This last fact remains true even when the population mean and population median have exactly the same value - as is the case when that population is symmetrical about μ, for example when it is 'normal'.
  • A 10% trimmed mean of n observations (calculated excluding the 10% largest & 10% smallest values) is the plug-in estimate for the population 10% trimmed mean (calculated excluding the 10% largest & 10% smallest of N values) - it is not a plug in estimate of the average value 10% trimmed means calculated from n observations.

You may find a biological example is a more useful way to see how these ideas inter-relate:

Imagine, if you will, that in order to protect an endangered species of butterfly (Ermentrudes flurtillary) a conservationist needs to find their current population size, N, taken to be the number of surviving adults at the time of his survey. If N is extremely small, and the population is isolated, it might be logistically possible to catch and count every one - but this is likely to be an expensive process, not least for the butterflies.

Nearly always therefore he has to make do with some kind of estimate (  ) of this parameter. If he makes this estimate on the basis of quadrat samples, mark release recapture, a visual survey, or (for a pest species) by removal-trapping, our conservationist is liable to use some more-or-less complicated formula to estimate the butterfly-population size. Or, if all else fails, he might calculate a (weighted) mean of any estimates published to date - or he could simply ask a local enthusiast.

Although it is far from being an exhaustive list, these methods clearly have differing costs, accuracy, problems, and reliability. Nevertheless, good or bad, they are all estimating the same parameter - the number of adult butterflies, N.

However, if he is to defend his estimate before a sceptical audience, our conservationist also needs to estimate the accuracy (bias) and variability (precision) of his chosen estimate of the Ermentrudes flurtillary population-size, . The first is usually presented as a point estimate, the latter as a range estimate such as a confidence interval or a likelihood interval.

That said, if his estimate simply reflects an opinion or opinions, while he may conceivably come up with a serviceable estimate of bias, predicting how that estimate might vary would require some rather dubious assumptions about the 'population' of opinion his observation(s) represent - and how his selection process might cause that outcome to vary...

    For the other techniques he might use to estimate this parameter, N, there are either no reliable pre-cooked formulae predicting the bias and variation of - or they assume he is dealing with (very) large samples, and that his audience are prepared to ignore the fact that he is not. As you may by now suspect, this problem is not unusual.

You may have noticed that all of the estimators of butterfly population size use quite different formulae to those of the parameter they are estimating. The only exception is the first method - a census - he simply goes and counts all the Ermentrudes flurtillary that can be found. Indeed, for very small populations, this is sometimes exactly what is done.

Provided our conservationist can avoid counting the same butterflies more than once, either by marking them or by recognising them individually, this technique has the advantage of having a firm minimum bound. If he finds 53 butterflies, but unknowingly misses the remaining 11, for better or worse he can state there were at least 53 adults surviving.

The disadvantage of this estimator is that it is only worth attempting for very tiny populations and, if you avoid counting the same individual twice, this method generally underestimates its parameter. - As anyone who has counted goldfish in a small pond will tell you.

While plug-in estimators are simple enough in principle, and are very heavily-used, they do not necessarily have the most desirable properties - even though bootstrap estimators assume the statistic of interest is a plug-in estimator.

    For example many plug in estimators are biased.

Logistical questions aside, one problem with a census as an estimator is that it is accuracy and precision are hard to estimate. Nevertheless, since this is about the simplest estimator you are likely to encounter, let us use it to sort out a few important terms and concepts.

Employing our usual notation, if there are N adult Ermentrudes flurtillary alive when our conservationist performs his survey, his estimate of the number of butterflies could be written as .

A more informative way of describing this would be to say that each attempt our conservationist makes to find a butterfly constitutes an observation ( yi ). In which case, each time he finds a butterfly observation yi is a 'success', and yi = 1, and each time he fails yi = 0. Referred to collectively however, variable Y indicates the entire sample (or category) of randomly-selected observations. Indicating a set of observations in this way is known as vector notation.

In principle of course, in a few hours our conservationist could make a considerable number of such observations. However even if he obtained a million observations, provided he counted each adult just once, his estimate  ) is simply the sum of his successes, or ΣY. From which it follows, if he made every possible observation, the parameter he is estimating ( N ) is also ΣY.

To avoid the inevitable confusion, and distinguish the sample and population statistics, let us use Y1 to indicate the set of observations in his sample, and Y0 for the population of observations he is sampling.

    In which case  = ΣY1 and N = ΣY0


In vector notation, Y1 is merely a subset of Y0
Given which, Y2 would indicate a sub-sample of Y1

Of course this particular statistic ( ΣY ) is an unusually simple one, but if we ignore how our statistic is actually calculated, any parameter ( Θ ) can be described as a function of its population, so Θ = F(Y0).
Similarly an estimator  ) is some function of a sample of that population, and = F(Y1).

For a plug-in estimator, although the functions F(.) and F(.) are identical, because Y1 is randomly selected from Y0 the results of these two functions are liable to differ - unless you have sampled (without replacement) the entire population, or (provided every butterfly could be observed using that sampling method) your sample is infinitely large.

    Notice that, whilst this last assumption is almost certainly untrue, the fact is seldom acknowledged. Indeed most large-sample estimators also assume each individual was equally likely to be observed - as do rarefaction estimators.
For instance:
  • is liable to be a biased estimate of N if males are much easier to observe than females, especially when females are more common - as happens when they survive longer.
  • If males are commonly found in groups (such as mating swarms) this can upset estimates of how N will vary which assume each insect is equally likely to be observed.