InfluentialPoints.com
Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

 

 

Fully replicated factorial ANOVA

Worked example 1

Our first worked example looks at an experiment to assess whether feral pigs had become bait-shy to the poison sodium monofluoroacetate (1080). It used data from Hone & Kleba (1984) (you can find it analyzed using SAS at University of Canberra (Biometrics))

A number of feral pigs were captured in an area where the poison 1080 had previously been used to control feral pigs and rabbits. 10 male and 10 female pigs were selected to be of approximately the same age and weight. Two male pigs were randomly allocated to each of five pens and two female pigs were randomly allocated to each of the remaining five pens.

On Day 1, all pigs were offered wheat only and their intake was recorded. On Day 2, they were offered one of the following: (a) Wheat only (b) Wheat and water (c) Wheat, water and dye (d) Wheat, water and 1080 (e) Wheat, water, dye and 1080, and intake was again recorded. The response variable was the change in intake in kg. If the pigs in the chosen area had become bait shy, it was hypothesized that intake of the mixture including bait would be reduced relative to the controls.

Change in bait intake by feral pigs
Gender Wheat Wheat & water Wheat, water & dye Wheat, water & 1080 Wheat, water, dye & 1080
Male 0.188
-0.058
0.050
-0.138
0.058
-0.082
-0.712
-1.280
-0.610
-0.830
Female -0.280
-0.062
-0.540
-0.336
-0.260
-0.123
-0.894
-0.672
-0.837
-1.202

  1. Draw boxplots and assess normality

    Plot out data to get a visual assessment of the treatment and block effects, and assess how appropriate (parametric) ANOVA is for the set of data.

    Figmc1.gif

    There is no evidence of any gross violations of assumptions re homogeneity of variances and normality from these plots although we will reconsider this when we look at model diagnostics. We also note that both factors may be affecting the response variable.

  2. Draw interaction plot

    If there were no interaction between bait treatment and gender, the two lines for the different genders should follow the same trends and be roughly parallel.

    Figmc2.gif

    There is no clear evidence of any interaction between bait treatment and gender, but the lines do cross so we should not be too surprised if the analysis reveals some indication of interaction between the factors. Since we have independent replicates of each combination, we can test for this - although with only two replicates for each combination, we have very little power for this test!

  3. Calculate factor means

    Using R
    tapply(resp,bait,mean)
          A1       A2       A3       A4       A5
    -0.05300 -0.24100 -0.10175 -0.88950 -0.86975
    > tapply(resp,gend,mean)
          F       M
    -0.5206 -0.3414

  4. Carry out analysis of variance

    Sums of squares can be calculated manually as follows:

    SStotal  =  7.24235  −  −8.622  =  3.52713
    20

    SSbait  =  −0.2122  +   −0.9642  +   −0.4072  +  
    444
        −3.5582  +   −3.4792  −  −8.622
    4420
       =  6.47567 - 3.71522 = 2.76045

    SSgender  =  −5.2062  +  −3.4142  −  −8.622  
    101020
       =  3.87578 − 3.71522   = 0.16056

    SSSubgrps  =  −0.3422  +  −0.8762  +   ...  −1.9922  +  −1.4402  −  −8.622  
    222220
       =  6.853895 − 3.71522   = 3.13867

    SSA B = 3.13867 − 2.76045 − 0.16056 = 0.21766

    ANOVA table
    Source of
    variation
    df SS MS F-
    ratio
    P
    Bait (A) 4 2.7605 0.6091 17.77 0.0001
    Gender (B)  1  0.1606 0.1606 4.13 0.0694
    A B 4 0.2177 0.0544 1.401 0.3023
    Error 10 0.3885 0.0388    
    Total 19 288.62      

    Considering the interaction first, the interaction between the factors bait and sex is not significant, even if we adopt a critical value of 0.25. We note that we have very little power to detect this interaction - but we also have little or no evidence that it exists. As for the main factors, bait is significant whilst gender is borderline significant (P =0.07).

     

    In R there are two different ways to do this analysis of variance:

    • Use lm() with the model defined as model=lm(resp~B*A); then use anova(model).
    • Use aov() with the model defined as model=aov(resp~B*A); then use summary(model)
    Below we have used the first of these methods:

    Using R
    Analysis of Variance Table
    Response: resp
              Df  Sum Sq Mean Sq F value    Pr(>F)
    bait       4 2.76045 0.69011 17.7658 0.0001538 ***
    gend       1 0.16056 0.16056  4.1334 0.0694460 .
    bait:gend  4 0.21766 0.05441  1.4008 0.3022677
    Residuals 10 0.38845 0.03885    

    The result is of course identical to the manual calculations.

  5. Check diagnostics

    We have chosen the easy option in R which is just to do plot(model) - this provides a range of diagnostic plots.

    Figmc3.gif

    The residuals versus fitted plot (top left) shows residuals reasonably evenly spread along the line aside from three points of concern identified by numbers (14,18,19). The normal quantile plot (top right) shows errors are approximatly normal, albeit not quite. The scale-location plot some evidence of a mean error-variance relationship, in other words heteroskedasticity. Note, Logan advises us, for ANOVA, to ignore Cooks D values.

  6. Simplify model

    Using R
    Analysis of Variance Table
    
    Model 1: resp ~ bait * gend
    Model 2: resp ~ bait + gend
      Res.Df      RSS Df Sum of Sq      F Pr(>F)
    1     10  0.38845
    2     14  0.60611 -4  -0.21766 1.4008 0.3023 

    Using R
    Analysis of Variance Table
    
    Model 1: resp ~ bait * gend
    Model 2: resp ~ bait + gend
      Res.Df      RSS Df Sum of Sq      F Pr(>F)
    1     10  0.38845
    2     14  0.60611 -4  -0.21766 1.4008 0.3023
    > anova(model2)
    Analysis of Variance Table
    
    Response: resp
              Df  Sum Sq Mean Sq F value    Pr(>F)
    bait       4 2.76045 0.69011 15.9403 4.132e-05 ***
    gend       1 0.16056 0.16056  3.7087   0.07468 .
    Residuals 14 0.60611 0.04329 

    This analysis indicates that we can safely drop the interaction term to provide a more parsimonious model. We then recheck the reduced model's disgnostics.

  7. Recheck diagnostics

    Figmc4.gif

    These four plots indicate observation 14, and possibly 16 & 20, merit further attention. This does not imply they ought to be eliminated, but one should at least check the raw data - and think about why they may be telling you something different from their fellows.

    One could then take the process further by first eliminating the gender effect (which is not significant) and then (possibly) combining the first 3 treatment levels (without poison) and the remaining 2 levels (with poison) to obtain the minimal adequate model.

    But since the gender effect is so close to significance, we decide against this.

  8. Simplify model using AIC

    An alternative approach to model simplification is to use the 'step' function in R. This uses the Akaike Information Criterion as the criterion for model selection.

    Using R
    Start:  AIC=-58.83
    resp ~ bait * gend
    
                Df Sum of Sq     RSS     AIC
                         0.388 -58.826
    - bait:gend  4     0.218   0.606 -57.929
    > anova(model,model2)
    Analysis of Variance Table
    
    Model 1: resp ~ bait * gend
    Model 2: resp ~ bait * gend
      Res.Df     RSS Df Sum of Sq F Pr(>F)
    1     10 0.38845
    2     10 0.38845  0   0.00000  

    Using this approach, all terms remain in the model including the interaction term. This is because the information criterion is much more liberal about retaining explanatory variables.