who carried out an observational study to compare Vitamin E concentrations (in the form of serum αtocopherol) in dogs with and without a degenerative nerve disorder (CDRM). We consider additional aspects of this work
. It appears that the dogs were convenience selected, so we are immediately violating the first of our assumptions (random sampling). One could argue that this renders the results meaningless, so there is nothing to be gained from further analysis. Unfortunately this would rule out a great deal of research! We will instead continue on the basis that we must restrict our inference to the group of animals 'at hand'.
Draw boxplots and assess normality
Box plots are examined to assess how appropriate (parametric) ANOVA is for the set of data.
{Fig. 1a}
The values of group A (affected German shepherd dogs) appear to have a skewed distribution and to be more variable than those in the other groups. A log transformation makes the distribution for group A more symmetrical (normal?)  but unfortunately appears to then make group B (unaffected German shepherd dogs) less symmetrical!
Normality is best assessed using normal QQ plots for the three groups.
{Fig. 1b}
This confirms our earlier conclusion  a log transformation does indeed normalize the distribution for affected dogs, but does little for the other two groups. It looks, therefore, as if a log transformation will be the best option  but does it homogenize variances?
Check homogeneity of variances
The simplest (and often the most appropriate) test of homogeneity of variances is Hartley's F_{max} test.
Variances for each group are 514.2716 (A), 152.97 (B) and 149.76 (C). This gives an F_{max} of 514.2716/149.76 = 3.43. Sample sizes are similar (25 versus 26) which gives a critical value (α = 0.05) for this test of between 2.4 and 2.95. Hence we conclude variances are significantly heterogeneous. After a log transformation variances for each group are 0.159
(A), 0.124 (B) and 0.188 (C). This gives an F_{max} of 0.188/0.124 = 1.52 which is not significant (P > 0.05).
Using
An alternative test of homogeneity of variances is Bartlett's test. R supports this test and we find that for the raw data P = 0.002. After a log transformation, P = 0.641, so we can at least accept variances of log transformed data are homogeneous  albeit distributions are certainly not identical.
Statisticians would differ on where to go next! Some would consider ANOVA sufficiently robust to cope with the untransformed data. Others would prefer a randomization test which does not require distributions to be normal. We will do the analysis on logtransformed data on the basis that at least variances will be homogeneous and ANOVA should be sufficiently robust to cope with the nonnormality.
Carry out analysis of variance on logtransformed data
Sums of squares can be calculated manually as follows:
SS_{groups
}  =
 95.93706^{2
}  +
 71.82635^{2
}  +
 88.65428^{2
}  −
 (256.4177)^{2} 

   
25  20  26  71

 =
 2.3427

SS_{total
}  =
 939.2925
 −
 (256.4177)^{2
}  =
 13.2356


71

SS_{within
}  =
 13.2356
 −
 2.3427
 =
 10.8929

It is of course much quicker and easier to do the analysis of variance in R:
Using
ANOVA table

Source of variation
 df
 SS
 MS
 F ratio
 P

Between groups 'Treatments'
 2
 2.3427
 1.1714
 7.3121
 < 0.001

Within groups 'Error' or 'Residual'
 68
 10.8929
 0.1602



Total
 70
 13.2356




This tells us that the 'treatment' effect is highly significant. In other words there are significant differences in mean Vitamin E concentration between some or all of the groups.
Perform diagnostics to ensure adequacy of model
 Levene's test
Now we have run the model on log transformed data, we will (for the sake of demonstrating the test) check again on the assumption of homogeneity of variances using Levene's test (you could of course have used this test as the initial homogeneity of variance test). This is readily carried out in R as you have the residuals available once you have run the ANOVA.
Using
In this case, F = 0.07175 (df = 2,68) and P = 0.9308. In other words there is no evidence of heterogeneity of means for the log transformed data.
We thought it would be interesting to check the result of Levene's on the untransformed data since the tests we used initially (Hartley's F_{max} and Bartlett's) are not very robust to skew. On the untransformed data, Levene's gives F = 3.682 (df = 2,68) and P = 0.03031. Hence there was significant heterogeneity prior to transformation. Some authorities suggest that one should only worry about heterogeneity if P < 0.01 in a Levene's test  but given that all tests suggest heterogeneity, we think a transformation was justified.
 Distribution of residuals
Plotting residuals (the difference between observed and fitted values) against fitted values provides us with another check on heteroscedasticity. Similarly a normal quantile plot of residuals allows us to check normality of errors.
{Fig. 1c}
Neither of these outcomes are 'ideal'  but they are probably as good as you will get with most data sets!
Assess effect sizes
At its simplest this involves comparing the 'treatment' means to determine which ones are significantly different from each other. Sometimes more complex operations are involved, for example comparing the average of two means with a third. This is the topic of the More Information page on Multiple comparison of means. But an ANOVA is not complete without examination of effect sizes. So for now, we will assume no comparisons have been preplanned, and just carry out all pairwise comparisons using Tukey's honestly significant difference test.
Table of means

Group
 Transformed
 Detransformed

Mean
 SE
 Mean
 95% CI

A B C
 3.8375 3.5913 3.4098
 0.1569 0.1754 0.1538
 46.41 36.28 30.26
 39.67  54.29 30.44  43.24 25.95  35.29

Using
Here we find that group A (affected German shepherd dogs) had significantly higher levels of vitamin E than group C (unaffected other breeds of dogs) (P = 0.0009). No other differences were significant at the P = 0.05 level. The authors performed an ANOVA on the untransformed data and reached similar conclusions.
Conclusions
We have intentionally chosen a 'badly behaved' set of data to analyze  because such data are all too common. By shutting one's eyes to the two major problems with the data  different distributions between groups and convenience sampling  we can perform an ANOVA that meets the textbook requirements for analysis.
The problem of different distributions between groups could be addressed using Welch's unequal variance ANOVA, and in a related topic above we do precisely this. Since group sizes are moderate and fairly similar, the outcome (P = 0.005) differs little from the result of the parametric ANOVA.
There is no satisfactory solution to the problem of convenience sampling  inference must be restricted to the group of animals 'at
hand'. Even hunchbased inference would only be meaningful if one carried out a detailed assessment of the possible biases in sample selection  and how these may have affected the result.