Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
Confidence limits by permutation
In order to assess the divergence of your observed statistic (for generality, let us call that statistic
Given which, it may not be immediately obvious how you might use the permutation test model to attach confidence limits to θ.
Whilst outwardly attractive, the range so-calculated is not a confidence interval of your observed value of θ. It only provides an estimate of θ's 95% theoretical range on the condition that null hypothesis is true. Nor would that range tell you anything useful about θ under the alternate hypothesis - because it simply ignores that possibility. Worse still, large treatment effects would be treated as random error, with predictably misleading consequences...
Under certain circumstances it is however possible to obtain a P−value function via test-inversion - applying permutation tests to 'shifted'
To clarify how this may work let us consider a simple example and two everyday summary statistics.
A larger experiment
You may recall the farmer's pour-on/tags comparison we subjected to a permutation test in
Trying to analyse the result of too few observations is a frustrating affair. As we shall see however, useful numbers of observations bring their own problems. For example let us assume that, encouraged by her initial 'suggestive' result, our farmer decides to apply her treatments to a slightly larger experimental
For sake of argument, let us assume she selected 15 calves for her second experiment. Following our advice, she randomly divides them into 3 three groups of 5
Then each animal is randomly assigned to a calf pen, above which a flypaper is hung. After 2 weeks the flypapers are removed, the catch is identified and
Having, somewhat laboriously, identified and counted her catches, our farmer obtains the following results.
Once again, it looks very much as if the pour-on was most effective at reducing the number of Stomoxys, although there remains the question of how often we would expect such a result to arise if this conclusion is incorrect.
If we accept that both insecticide treatments are effective, the question is which is most effective? Assuming that the statistic of interest is the ratio of catches, or log ratio, a permutation test of the difference between log means, d, found 11.41% of randomizations gave a difference as great - whereas testing the ratio of catches, r, yielded a (very similar) one-sided mid-P value of 12.02%.
Let us estimate confidence intervals for
Although we test
Since r & d are (approximately) equivalent statistics, we are assuming that our farmer's pour-on killed or repelled proportionally more Stomoxys than her ear tags. In other words, if we ignore other sources of variation, we assume the only effect of her pour-on is to reduce the catch around tagged animals by r times - or to reduce the log (tag) catch by d. In which case, we can easily remove this difference, either by dividing all the tag catches by r, or by subtracting d from the individual log (tag)
Notice that, although these modified results become our model's parameters, they are not estimates of the population parameters - we are merely playing a game of 'what if?' If these model parameters are unrealistic, the fault is entirely our own. Whatever the true μr & μd actually are, some of the results of testing modified tag catches are easy enough to predict.
The graph set below shows the result of testing 21 possible values of
Because we are modifying the samples as well as their combined population, we are actually estimating confidence limits about
Notice also that, unlike a parametric test, although all of these tests assume the null hypothesis holds, the population of statistics we are comparing our observed result with is not assumed to have a mean of either zero or one. Instead, by modifying our data, we are setting our own parametric value (in this case as D or R). Nor does any of this imply we are sampling an infinite population. Our experimental population remained just 10 observations - even though we were estimating how closely our observed result was likely to resemble its true value, simply by varying the parameter of interest in our samples, and observing how often such a (divergent) result would occur by chance - all else being equal.
Remember, the only population of observations these permutation tests refer to is the very finite collection we have actually observed. - Any extrapolation to a wider population is non statistical, and requires you make due allowance for the various biases in selecting your experimental subjects.