InfluentialPoints.com
Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

 

 

ANOVA for blocked designs

Worked example 1

Our first worked example uses interpolated data from Richardson et al. (2008) on the effect of different oyster aquaculture techniques on microbial aerobic activity at Willapa Bay. The author described the design as a randomized complete block with three locations (= blocks) and six habitat types (fixed factor) within each block. However, since it is an observational study, treatments (habitat types) were clearly not randomly allocated, and it would be better described as an unreplicated two factor design. It can be analysed using a randomized block ANOVA but some aspects are problematical.

Aerobic sole-source carbon use (SCU)
HabitatSite (=block)
Nahcotta
(Nah)
Long Island
(Lng)
Nemah
(Nem)
Eelgrass (eel)
Bare (bar)
Longlines (lon)
Picked (pik)
Dredged (drd)
Hummocks (hum)
34.3
35.3
31.4
23.1
24.2
35.8
35.3
31.4
29.6
23.6
29.9
31.2
32.7
30.6
35.1
26.5
28.8
27.3

  1. Draw boxplots and assess normality

    Plot out data to get a visual assessment of the treatment and block effects, and assess how appropriate (parametric) ANOVA is for the set of data.

    Figma1.gif

    First impressions are that the block effect seems fairly small but there may well be a difference between habitats. There is nothing too horrendous in terms of distributions - at least nothing that a transformation would improve - so we proceed to an interaction plot.

  2. Check for location × habitat type interaction

    We start with a simple location × habitat type interaction plot. If there were no interaction between locations and habitat types, microbial activity (SCU) should follow parallel trends across habitat types for each of the different locations.

    Figma2.gif

    Microbial activity seems to follow similar trends across habitats in Nemah and Long Island, but not in Nahcotta. Hummocks have the highest SCU readings in Nahcotta, but one of the lowest in Nemah. We cannot make a general test on whether this interaction is significant because of the lack of replication. However, we can test for whether there is significant interaction of the multiplicative form using the Tukey non-additivity test.

    There appear to be two versions of this test available in R - the version in the package alr3 which is tukey.nonadd.test() and a version provided by Guido Wyseure. Unfortunately they give quite different results! We have used the latter version since it gives the same results as those given by Underwood (1997)

    Using R
     Tukey's one df F test for Additivity
    data:  hab and loc on scu
    F = 2.0946, num df = 1, denom df = 9, p-value = 0.1817
    sample estimates:
    D estimate
      1.223416

    In this case P = 0.182. Using the conventional P < 0.05 level, we would say there is no evidence for a multiplicative interaction. However, using more liberal significance level of P = 0.25 (as widely recommended), there are clearly grounds for suspecting interaction. For now we will continue with the analysis, but will return to the point later.

  3. Get table of means

    Using R
     tapply(scu,hab,mean)
         bar      drd      eel      hum      lon      pik
    32.43333 27.63333 34.10000 31.43333 32.03333 24.40000
    > tapply(scu,loc,mean)
         Lng      Nah      Nem
    30.16667 30.68333 30.16667 

  4. Carry out analysis of variance

    Sums of squares can be calculated manually as follows:

    SStotal  =  16856.69  −  (546.1)2  =  288.62
    18

    SSlocations  =  1812  +   1812  +   184.12  −  546.12
    66618
       =  16569.14 - 16568.07 = 1.07

    SShabitats  =  97.32  +  82.92  +  102.32  +  94.32  +  96.12  +  73.22  −  (546.1)2  
    33333318
       =  16763.64 - 16568.07   = 195.57

    SSresidual  =  288.62 - 195.57 - 1.07  =   91.98

    These values are inserted into the ANOVA table and the mean squares calculated.

    ANOVA table
    Source of
    variation
    df SS MS F-
    ratio
    P
    Locations 2 1.07 0.534 (0.058) (0.944)
    Habitats  5  195.57 39.12 4.25 0.025
    Residual 10 91.98 9.198    
    Total 17 288.62      

    The habitat factor came out significant at P = 0.025.
    Assuming blocks are fixed, SEhabitat mean = √(9.198/3) = 1.751
    Assuming blocks are random, SEhabitat mean = √[(9.198+0.534)/3)] = 1.801 (Note taking this into account makes little difference in this case because the block effect is very slight)
    SEdifference between habitat means = √ [(2 × 9.198)/3)] = 2.476
    Testing between blocks is inappropriate if blocks is a random factor.

     

    In R there are several different ways to do this analysis.

    • Use lm() with the model defined as model=lm(resp~B+A).
    • Use aov() with the model defined as model=aov(resp~B+A)
    • Use aov() with the model defined as model=aov(resp~A, error=B).

    Using R
    Analysis of Variance Table
    
    Response: scu
              Df  Sum Sq Mean Sq F value  Pr(>F)
    loc        2   1.068   0.534  0.0580 0.94392
    hab        5 195.576  39.115  4.2526 0.02471 *
    Residuals 10  91.979   9.198
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

  5. Check model diagnostics

    Since there is only one error term in this analysis, we can use the standard residual plots produced by R to examine diagnostics.

    Figma3.gif

    The residuals versus fitted plot shows no clear trend, and the residuals are reasonably normally distributed. If we had random allocation of treatments within blocks, we could proceed to comparing mean. But in this case we are dealing with an observational study with no random allocation of treatments. Hence it would make sense to check the homogeneity of covariances assumption by estimating sphericity and (just for good measure) the variance-covariance matrix.

    Using R
     [1] 0.1897988
    > covrcd
              [,1]      [,2]      [,3]
    [1,] 10.894667  6.444667  7.339333
    [2,]  6.444667 14.490667 16.133333
    [3,]  7.339333 16.133333 32.125667 

    Unfortunately epsilon comes out at only 0.19 indicating a major deviation from sphericity. This casts serious doubt on the validity of the analysis. We attempted to fit the linear mixed effect model (using lme()) but the model with unstructured covariance structure would not converge, and none of the other covariance structures were significantly better than the compound symmetry model which gave the same P- value for the treatment factor of 0.0247.

    Hence we proceed to test differences between habitats. In this case we will use Tukey's HSD test. One way to do it is to obtain a list of all differences between means and compare these differences with the honestly significant difference:

    Using R
                        habdrd    habeel    habhum    hablon   habpik
            0.000000  4.800000 -1.666667  1.000000  0.400000 8.033333
    habdrd -4.800000  0.000000 -6.466667 -3.800000 -4.400000 3.233333
    habeel  1.666667  6.466667  0.000000  2.666667  2.066667 9.700000
    habhum -1.000000  3.800000 -2.666667  0.000000 -0.600000 7.033333
    hablon -0.400000  4.400000 -2.066667  0.600000  0.000000 7.633333
    habpik -8.033333 -3.233333 -9.700000 -7.033333 -7.633333 0.000000
    > detach(ecorcd1)
    > qtukey(0.95,6,10)*2.476/sqrt(2)
    [1] 8.59994

    Only differences greater than 8.6 are significant at P = 0.05. Hence the only significant difference in microbial activity is between the eelgrass habitat and the picked habitat.

    We get the same result if we use the glht() function in the multcomp package:

    Using R
    Simultaneous Tests for General Linear Hypotheses
    
    Multiple Comparisons of Means: Tukey Contrasts
    
    Fit: aov(formula = scu ~ hab + loc)
    
    Linear Hypotheses:
                   Estimate Std. Error t value Pr(>|t|)
    drd - bar == 0   -4.800      2.476  -1.938   0.4341
    eel - bar == 0    1.667      2.476   0.673   0.9812
    hum - bar == 0   -1.000      2.476  -0.404   0.9982
    lon - bar == 0   -0.400      2.476  -0.162   1.0000
    pik - bar == 0   -8.033      2.476  -3.244   0.0708 .
    eel - drd == 0    6.467      2.476   2.611   0.1803
    hum - drd == 0    3.800      2.476   1.535   0.6529
    lon - drd == 0    4.400      2.476   1.777   0.5183
    pik - drd == 0   -3.233      2.476  -1.306   0.7764
    hum - eel == 0   -2.667      2.476  -1.077   0.8799
    lon - eel == 0   -2.067      2.476  -0.835   0.9538
    pik - eel == 0   -9.700      2.476  -3.917   0.0254 *
    lon - hum == 0    0.600      2.476   0.242   0.9998
    pik - hum == 0   -7.033      2.476  -2.840   0.1296
    pik - lon == 0   -7.633      2.476  -3.083   0.0904 .
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    (Adjusted p values reported -- single-step method)

     

    So what can we conclude??

    1. The evidence for any consistent difference in microbial activity between habitat types is weak.
    2. Eelgrass sites appear to have a higher microbial aerobic activity than picked sites (P = 0.0254). There is also a suggestion that longlines and bare sites have a higher microbial aerobic activity than picked sites (P < 0.10).
    3. But there are serious threats to validity of these conclusions, namely the possibility of a habitat × location interaction (meaning that habitat differences are not consistent across locations) and the unreliability of the P-values (given that homogeneity of covariance assumptions are not met).