Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
One-way fixed effects ANOVA(Model I)
Worked example 1
Our first worked example uses data from Johnston et al.
Data are presented below:
Draw boxplots and assess normalityBox plots are examined to assess how appropriate (parametric) ANOVA is for the set of data.
The values of group A (affected German shepherd dogs) appear to have a skewed distribution and to be more variable than those in the other groups. A log transformation makes the distribution for group A more symmetrical (normal?) - but unfortunately appears to then make group B (unaffected German shepherd dogs) less symmetrical!
Normality is best assessed using normal QQ plots for the three groups.
This confirms our earlier conclusion - a log transformation does indeed normalize the distribution for affected dogs, but does little for the other two groups. It looks, therefore, as if a log transformation will be the best option - but does it homogenize variances?
Check homogeneity of variances
The simplest (and often the most appropriate) test of homogeneity of variances is Hartley's Fmax test.
Variances for each group are 514.2716 (A), 152.97 (B) and
An alternative test of homogeneity of variances is Bartlett's test. R supports this test and we find that for the raw data P = 0.002. After a log transformation, P = 0.641, so we can at least accept variances of log transformed data are homogeneous - albeit distributions are certainly not identical.
Statisticians would differ on where to go next! Some would consider ANOVA sufficiently robust to cope with the untransformed data. Others would prefer a randomization test which does not require distributions to be normal. We will do the analysis on log-transformed data on the basis that at least variances will be homogeneous and ANOVA should be sufficiently robust to cope with the non-normality.
Carry out analysis of variance on log-transformed data
It is of course much quicker and easier to do the analysis of variance in R:
Perform diagnostics to ensure adequacy of model
Neither of these outcomes are 'ideal' - but they are probably as good as you will get with most data sets!
Assess effect sizes
At its simplest this involves comparing the 'treatment' means to determine which ones are significantly different from each other. Sometimes more complex operations are involved, for example comparing the average of two means with a third. This is the topic of the More Information page on Multiple comparison of means. But an ANOVA is not complete without examination of effect sizes. So for now, we will assume no comparisons have been preplanned, and just carry out all pairwise comparisons using Tukey's honestly significant difference test.
Here we find that group A (affected German shepherd dogs) had significantly higher levels of vitamin E than group C (unaffected other breeds of dogs) (P = 0.0009). No other differences were significant at the P = 0.05 level. The authors performed an ANOVA on the untransformed data and reached similar conclusions.
ConclusionsWe have intentionally chosen a 'badly behaved' set of data to analyze - because such data are all too common. By shutting one's eyes to the two major problems with the data - different distributions between groups and convenience sampling - we can perform an ANOVA that meets the textbook requirements for analysis.
The problem of different distributions between groups could be addressed using Welch's unequal variance ANOVA, and in a related topic
There is no satisfactory solution to the problem of convenience sampling - inference must be restricted to the group of animals 'at hand'. Even hunch-based inference would only be meaningful if one carried out a detailed assessment of the possible biases in sample selection - and how these may have affected the result.