Draw boxplots and assess normality
Box plots are examined to assess how appropriate (parametric) ANOVA is for the set of data.
{Fig. 1a}
The values of group A (affected German shepherd dogs) appear to have a skewed distribution and to be more variable than those in the other groups. A log transformation makes the distribution for group A more symmetrical (normal?) - but unfortunately appears to then make group B (unaffected German shepherd dogs) less symmetrical!
Normality is best assessed using normal QQ plots for the three groups.
{Fig. 1b}
This confirms our earlier conclusion - a log transformation does indeed normalize the distribution for affected dogs, but does little for the other two groups. It looks, therefore, as if a log transformation will be the best option - but does it homogenize variances?
Check homogeneity of variances
The simplest (and often the most appropriate) test of homogeneity of variances is Hartley's Fmax test.
Variances for each group are 514.2716 (A), 152.97 (B) and 149.76 (C).
This gives an Fmax of 514.2716/149.76 = 3.43. Sample sizes are similar (25 versus 26) which gives a critical value (α = 0.05) for this test of between 2.4 and 2.95. Hence we conclude variances are significantly heterogeneous. After a log transformation variances for each group are 0.159
(A), 0.124 (B) and 0.188 (C). This gives an Fmax of 0.188/0.124 = 1.52 which is not significant (P > 0.05).

Using
An alternative test of homogeneity of variances is Bartlett's test. R supports this test and we find that for the raw data P = 0.002. After a log transformation, P = 0.641, so we can at least accept variances of log transformed data are homogeneous - albeit distributions are certainly not identical.
Statisticians would differ on where to go next! Some would consider ANOVA sufficiently robust to cope with the untransformed data. Others would prefer a randomization test which does not require distributions to be normal. We will do the analysis on log-transformed data on the basis that at least variances will be homogeneous and ANOVA should be sufficiently robust to cope with the non-normality.
Carry out analysis of variance on log-transformed data
Sums of squares can be calculated manually as follows:
It is of course much quicker and easier to do the analysis of variance in R:

Using
ANOVA table
|
Source of variation
| df
| SS
| MS
| F- ratio
| P
|
Between groups 'Treatments'
| 2
| 2.3427
| 1.1714
| 7.3121
| < 0.001
|
Within groups 'Error' or 'Residual'
| 68
| 10.8929
| 0.1602
|
|
|
Total
| 70
| 13.2356
|
|
|
| | |
This tells us that the 'treatment' effect is highly significant. In other words there are significant differences in mean Vitamin E concentration between some or all of the groups.
Perform diagnostics to ensure adequacy of model
- Levene's test
Now we have run the model on log transformed data, we will (for the sake of demonstrating the test) check again on the assumption of homogeneity of variances using Levene's test (you could of course have used this test as the initial homogeneity of variance test). This is readily carried out in R as you have the residuals available once you have run the ANOVA.

Using
In this case, F = 0.07175 (df = 2,68) and P = 0.9308. In other words there is no evidence of heterogeneity of means for the log transformed data.
We thought it would be interesting to check the result of Levene's on the untransformed data since the tests we used initially (Hartley's Fmax and Bartlett's) are not very robust to skew. On the untransformed data, Levene's gives F = 3.682 (df = 2,68) and P = 0.03031. Hence there was significant heterogeneity prior to transformation. Some authorities suggest that one should only worry about heterogeneity if P < 0.01 in a Levene's test - but given that all tests suggest heterogeneity, we think a transformation was justified.
- Distribution of residuals
Plotting residuals (the difference between observed and fitted values) against fitted values provides us with another check on heteroscedasticity. Similarly a normal quantile plot of residuals allows us to check normality of errors.
{Fig. 1c}
Neither of these outcomes are 'ideal' - but they are probably as good as you will get with most data sets!
Assess effect sizes
At its simplest this involves comparing the 'treatment' means to determine which ones are significantly different from each other. Sometimes more complex operations are involved, for example comparing the average of two means with a third. This is the topic of the More Information page on Multiple comparison of means.
But an ANOVA is not complete without examination of effect sizes. So for now, we will assume no comparisons have been preplanned, and just carry out all pairwise comparisons using Tukey's honestly significant difference test.
Table of means
|
Group
| Transformed
| Detransformed
|
Mean
| SE
| Mean
| 95% CI
|
A B C
| 3.8375 3.5913 3.4098
| 0.1569 0.1754 0.1538
| 46.41 36.28 30.26
| 39.67 - 54.29 30.44 - 43.24 25.95 - 35.29
| | |

Using
Here we find that group A (affected German shepherd dogs) had significantly higher levels of vitamin E than group C (unaffected other breeds of dogs) (P = 0.0009). No other differences were significant at the P = 0.05 level. The authors performed an ANOVA on the untransformed data and reached similar conclusions.
Conclusions
We have intentionally chosen a 'badly behaved' set of data to analyze - because such data are all too common. By shutting one's eyes to the two major problems with the data - different distributions between groups and convenience sampling - we can perform an ANOVA that meets the textbook requirements for analysis.
The problem of different distributions between groups could be addressed using Welch's unequal variance ANOVA, and in a related topic above
we do precisely this. Since group sizes are moderate and fairly similar, the outcome (P = 0.005) differs little from the result of the parametric ANOVA.
There is no satisfactory solution to the problem of convenience sampling - inference must be restricted to the group of animals 'at
hand'. Even hunch-based inference would only be meaningful if one carried out a detailed assessment of the possible biases in sample selection - and how these may have affected the result.