Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
ANOVA by randomizationOn this page: Why bother with randomization? ANOVA by simple permutation ANOVA by simple bootstrap ANOVA simulation models using residuals
Why bother with randomization?
Under both the null and alternate hypothesis parametric ANOVA assumes errors within groups are randomly and independently selected from identically distributed normal, and therefore infinite, (parent) populations - it also assumes group sizes do not vary. The Kruskal-Wallis (KW) test, whilst 'more-or-less' equivalent to ANOVA upon rank-transformed data, only removes the normality assumption - and at a price.
Firstly its inferences are conditional upon the observed (null) error distribution (of ranks). Secondly, whilst the KW test does not assume errors are normal, it does assume data are continuous (untied), errors are independently distributed - and the observed differences in their distribution within groups have arisen by simple chance.
Furthermore, because testing a mean rank is equivalent to testing a median, the KW test employs a different measure of location to parametric ANOVA - but it is only a test of the difference between medians where error variances are homogenous. Like the WMW test from which it was derived, the KW test is really a test of
An alternate and increasingly popular way to relax some parametric ANOVA assumptions is to use simulation models to estimate the distribution of F under H0, rather than using a mathematically tractable (but arbitrary) distribution function. An additional advantage of simulation models is they enable you to use statistics such as trimmed means. Whatever your statistics of choice, a simulation model is used to generate repeated sets of values under the null hypothesis, each of which are subject to ANOVA - and the resulting distribution of F-statistics (or their trimmed-mean analogues) are used instead of the parametric F-distribution. Of these simulation models, the most popular is the permutation test.
ANOVA by simple permutation
How to do it
First of all you calculate the ANOVA table and F-statistic in the usual way. To estimate the distribution of F (under Hnil), observations are pooled and randomly assigned (without replacement) to K groups of predefined sample size (n1 n2 to nK). Then an ANOVA is performed, and the F-value is recorded. To estimate how F varies due to random assignment, this process is repeated a sufficient number of times (perhaps 5000). For a conventional 1-tailed test the P-value is what proportion of those F-values equal or exceed the observed F-value.
Worked Example 1:
This gave us a mid P-value of 0.0021, and a conventional P-value of 0.0022, compared with P=0.00895 from a KW test. You will of course get a slightly different P-value each time you run the test - although with 5000 replicates the variation will not be very great unless P is small. Recall the KW test uses a large-sample chi-squared approximation, and assumes the data lack ties (these data are tied). A log-transformation stabilizes the variances reasonably well. Applying a permutation test to the ln-transformed data gave a mid P-value of 0.0023, and a conventional P-value of 0.0024.
Worked example 2
This gave us a mid P-value of 0.0015, and a conventional P-value of 0.0016, whereas P=0.001329 of the parametric F-distribution exceeded the observed value of F. Applying the same 3 tests to untransformed data gave P=0.0009, 0.001 and 0.001283 respectively.
Assumptions and properties
Although commonly described as such, permutation tests are only slightly more distribution-free than the KW test in that permutation tests do not assume data are continuous (un-tied). ANOVA by permutation still assumes errors are identically distributed and independently assigned. However, unlike the KW test, transforming data can influence the results of ANOVA F-tests using permutation. In addition, by pooling observations assuming Hnil is true, no allowance is made for treatment effects. Therefore, when group means differ (under HA), the effect of treatment upon group location will be incorporated into that nil model - thus increasing the predicted variation of F, and reduce the resulting P-value. Conversely, when groups have similar means but their error distributions differ in other respects (such as variance or skew) this assumption can bias your inference.
Permutation models do not assume data represent an infinite normal population but are conditional upon the entirely finite set of values you have observed. In other words, ANOVA by permutation ignores what happens if you were to repeat your study and observe different values. Again, whilst ANOVA by permutation does not assume data are continuous, small sets of heavily-tied data will cause F to be noticeably discrete - making conventional inference conservative compared to mid-P.
ANOVA by simple bootstrap
How to do it
Worked example 3
Again this gave us a mid P-value of 0.0015, and a conventional P-value of 0.0016.
Assumptions and properties
A practical problem with bootstrapping is, because the distribution of the pooled observations is unavoidably discrete, it may not provide a very good model of the population from which the observations were drawn. This model population is determined by your observed values, so analysis of small studies are vulnerable to aberrant values. Also the range of values will seldom be as great as those of the parent population - which, like a permutation test, restricts the range of possible P-values when testing small tied groups. Whilst
An alternative 'semi-parametric' method is to assume observations represent a known (infinite) frequency distribution, estimate its parameters from your data, then sample that distribution. Whilst semi-parametric bootstrapping can be useful for estimators whose properties are poorly described, the estimates tend to be biased, and your choice of distribution may be criticized as arbitrary. Jittered bootstrapping avoids such estimates and assumptions - and selecting a suitable jittering distribution is generally easier and less controversial.
ANOVA simulation models using residuals
How to do it
Calculate the ANOVA table and F-statistic as usual, and find the difference between each observation and its group mean. Assuming errors are identically distributed, these differences are pooled, and this pool is randomly sampled without replacement (permuted) - or it is sampled with replacement (bootstrapped). The observed value of F is then compared with its estimated distribution under H0. Below we again analyze the data from Johnston et al.
Permuting the errors gave us a mid P-value of 0.0007, and a conventional P-value of 0.0008. Whereas applying a permutation test to the ln-transformed observations gave a mid P-value of 0.0023, and a conventional P-value of 0.0024.
Assumptions and properties
One attraction of permuting or bootstrapping deviations from group means, rather than the observations themselves, it this enables you to ensure the null hypothesis is true, rather than merely assuming it is so - and hence reduces P-values where group means are observed to differ. An important disadvantage is, by shifting distributions in this way, highly implausible error structures are sometimes created. Another problem is that, whilst lack of 'power' is worst when testing small samples, this is also where bootstrapping is most vulnerable to data artefacts.
On the other hand, when errors cannot be assumed to be distributed identically, instead of merging them, simulation models can be devised that keep each group of errors separate - and sample them with
In conclusion, whilst simulation models offer considerable potential, this is seldom realized outside of statistical journals. Among biologists their application is dictated more by precedence and ease of application than by their appropriateness to study design and data. One reason for this is that non-statisticians are understandably reluctant to employ analyses whose reasoning and properties are novel and little understood.