Our first worked example looks at an experiment to assess whether feral pigs had become bait-shy to the poison sodium monofluoroacetate (1080). It used data from Hone & Kleba (1984)
(you can find it analyzed using SAS at University of Canberra (Biometrics)
)
A number of feral pigs were captured in an area where the poison 1080 had previously been used to control feral pigs and rabbits. 10 male and 10 female pigs were selected to be of approximately the same age and weight. Two male pigs were randomly allocated to each of five pens and two female pigs were randomly allocated to each of the remaining five pens.
On Day 1, all pigs were offered wheat only and their intake was recorded. On Day 2, they were offered one of the following: (a) Wheat only (b) Wheat and water (c) Wheat, water and dye
(d) Wheat, water and 1080 (e) Wheat, water, dye and 1080, and intake was again recorded. The response variable was the change in intake in kg. If the pigs in the chosen area had become bait shy, it was hypothesized that intake of the mixture including bait would be reduced relative to the controls.
Draw boxplots and assess normality
Plot out data to get a visual assessment of the treatment and block effects, and assess how appropriate (parametric) ANOVA is for the set of data.
There is no evidence of any gross violations of assumptions re homogeneity of variances and normality from these plots although we will reconsider this when we look at model diagnostics. We also note that both factors may be affecting the response variable.
Draw interaction plot
If there were no interaction between bait treatment and gender, the two lines for the different genders should follow the same trends and be roughly parallel.
There is no clear evidence of any interaction between bait treatment and gender, but the lines do cross so we should not be too surprised if the analysis reveals some indication of interaction between the factors. Since we have independent replicates of each combination, we can test for this - although with only two replicates for each combination, we have very little power for this test!
Calculate factor means
Carry out analysis of variance
Sums of squares can be calculated manually
as follows:
SStotal
| =
| 7.24235
| −
| −8.622
| =
| 3.52713
|
 |
20
| | |
SSA × B
= 3.13867 − 2.76045 − 0.16056 = 0.21766
ANOVA table
|
Source of variation
| df
| SS
| MS
| F- ratio
| P
|
Bait (A)
| 4
| 2.7605
| 0.6091
| 17.77
| 0.0001
|
Gender (B)
| 1
| 0.1606
| 0.1606
| 4.13
| 0.0694
|
A × B
| 4
| 0.2177
| 0.0544
| 1.401
| 0.3023
|
Error
| 10
| 0.3885
| 0.0388
|
|
|
Total
| 19
| 288.62
|
|
|
| | |
Considering the interaction first, the interaction between the factors bait and sex is not significant, even if we adopt a critical value of 0.25. We note that we have very little power to detect this interaction - but we also have little or no evidence that it exists. As for the main factors, bait is significant whilst gender is borderline significant (P =0.07).
In R there are two different ways to do this analysis of variance:
- Use lm() with the model defined as model=lm(resp~B*A); then use anova(model).
- Use aov() with the model defined as model=aov(resp~B*A); then use summary(model)
Below we have used the first of these methods:
The result is of course identical to the manual calculations.
Check diagnostics
We have chosen the easy option in R which is just to do plot(model) - this provides a range of diagnostic plots.
The residuals versus fitted plot (top left) shows residuals reasonably evenly spread along the line aside from three points of concern identified by numbers (14,18,19). The normal quantile plot (top right) shows errors are approximatly normal, albeit not quite. The scale-location plot some evidence of a mean error-variance relationship, in other words heteroskedasticity. Note, Logan
advises us, for ANOVA, to ignore Cooks D values.
Simplify model
This analysis indicates that we can safely drop the interaction term to provide a more parsimonious model. We then recheck the reduced model's disgnostics.
Recheck diagnostics
These four plots indicate observation 14, and possibly 16 & 20, merit further attention. This does not imply they ought to be eliminated, but one should at least check the raw data - and think about why they may be telling you something different from their fellows.
One could then take the process further by first eliminating the gender effect (which is not significant) and then (possibly) combining the first 3 treatment levels (without poison) and the remaining 2 levels (with poison) to obtain the minimal adequate model.
But since the gender effect is so close to significance, we decide against this.
Simplify model using AIC
An alternative approach to model simplification is to use the 'step' function in R. This uses the Akaike Information Criterion as the criterion for model selection.
Using this approach, all terms remain in the model including the interaction term. This is because the information criterion is much more liberal about retaining explanatory variables.