 InfluentialPoints.com
Biology, images, analysis, design...
 Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)  ### Principles

We define a factorial design as when you have fully replicated measures on two or more crossed factors. Note that there are several other definitions of a factorial design in the literature. For example Sokal & Rohlf (1995) only use the term when there are more than two treatment factors. You can also find the (very odd) term 'one way factorial ANOVA' used quite widely in the literature, presumably to mean one factor ANOVA. We only use the term factorial design to describe a design with replicated measures on two or more crossed factors.

In a factorial design multiple independent effects are tested simultaneously. Each level of one factor is tested in combination with each level of the other(s) so the design is orthogonal. The analysis of variance aims to investigate both the independent and combined effect of each factor on the response variable. The combined effect is investigated by assessing whether there is a significant interaction between the factors. If there is no interaction, the effect on the response variable of certain levels of two factors acting together will equal the sum of the responses to the same levels of the two factors in isolation. If interaction is present, the effect of the two factors together may be greater or less than the sum of the two in isolation. Use of interaction plots

In the analysis one first calculates sums of squares for the main effects. These are the effect of each independent variable on the response variable, ignoring the effects of all other independent variables. Only then do we consider the interactions which are the variablity left-over after main effects have been considered. Since the sums of squares are accounting for components in sequence they are known as sequential or Type I SS. This is important because (as we shall see below) there are other ways to calculate sums of squares that we will encounter when we look at unbalanced designs.

When it comes to interpreting those interactions the process works in the opposite direction. One first examines the interaction terms. If an interaction is significant, one compares levels of one factor at each level of the other factor. These are (rather confusingly) known as the simple main effects. Generally the main effect (the average level irrespective of the other factor) is only of interest if the interaction is not significant. However, if there is only slight interaction one may still be interested in average level.

One should always bear in mind that any factorial ANOVA has less power to detect an interaction than it has to detect a main effect. For example for a 2 × 2 factorial one would need four times as many experimental units to have the same power to demonstrate the interaction as a main effect. This is important if the main interest is in detecting an interaction (see our example on detecting the interaction between temperature and carbon dioxide on plant growth ). However, if there really is very little evidence for an interaction (say P > 0.25), the interaction term can be dropped from the model in the process of model simplification.

Factorial treatment structures can either be used in a completely randomized design or as part of a variety of other designs. We provide here the mathematical model and computational details for the designs we covered in the core text (the completely randomized and randomized complete block designs). We also consider the nested cross-factored design and (in a related topic) the thorny issue of non-orthogonal (unbalanced) factorial designs.

#### Fixed effects model

We first consider an experiment where we have two (or more) fixed factors. Treatment combinations are assigned at random to experimental units.

 Factorial design - completely randomized A1B1 A2B2 A1B2 A1B2 A2B1 A2B1 A2B2 A1B2 A1B1 A1B1 A2B1 A2B2

The model for analysis of variance of this design is given below.

Factors A & B both fixed
 Yijk  =  μ + αi + βj + (αβ)ij + εijk
where:
• Yijk is the kth observation at the ith level of factor A and the jth level of factor B,
• μ is the population (grand) mean,
• αi is the fixed effect for the ith level of factor A,
• βj is the fixed effect for the jth level of factor B,
• (αB)ij is the interaction effect between factors A and B,
• εijkis the random error effect.

 Source of variation df Expected MS Variance ratio 1. Factor A a-1 σ2 + nbΣ α2/(a-1) MS1/MS4 2. Factor B b-1 σ2 + naΣ β2/(b-1) MS2/MS4 3. A × B (a-1)(b-1) σ2 + nΣ (αβ)2/(a-1)(b-1) MS3/MS4 4. Error N-ab σ2 Total variation N-1
where:
• a and b are the number of levels of factor A and B respectively,
• n is the number of replicates for each combination of factor A and factor B,
• N = Total number of observations,
• σ2 is the error variance,
• nbΣα2/(a-1) is the added treatment component for factor A,
• naΣβ2/(b-1) is the added treatment component for factor B,
• nΣ (αβ)2/((a-1)(b-1)) is the added interaction component.

The F-ratio for factor A is obtained by dividing MSA by MSError. The P-value for this F-ratio is obtained for a − 1 and ab − 1 degrees of freedom.

The F-ratio for factor B is obtained by dividing MSB by MSError. The P-value for this F-ratio is obtained for ab − 1 and ab(n − 1) degrees of freedom.

The F-ratio for the A × B interaction is obtained by dividing MSA×B by MSError. The P-value for this F-ratio is obtained for ab − 1 and ab(n − 1) degrees of freedom.

It is not uncommon to find (usually after a certain amount of introspection) that one has only one independent observation of each combination rather than the anticipated 'n' replicated observations (see nested cross factored design below). One can analyze such designs if one assumes one or more of the interaction terms is zero, and then use the interaction term as the error. This is equivalent to the randomized complete block design which is a two-way factorial anova with only one replicate per cell.

#### Computational formulae

We will take a balanced experiment with 'a' levels of Factor A, 'b' levels of factor B, with 'n' replicates of each factor combination.

Factor A totals are denoted as TA1 to TAa, Factor B totals as TB1 to TBb, subtotals (AB combinations) as T(AB)1 to T(AB)ab and the grand total as G.

The factor A, factor B, A × B interaction, error and total sums of squares are calculated as follows:

Algebraically speaking -
 SSTotal = Σ( Yijk2 ) − G2 N
where:
• SSTotal is the total sums of squares (or Σ( - )2,where is the overall mean),
• Yijk is the value of the kth observation at level i of Factor A and at level j of Factor B,
• G is the overall total (or ΣYijk) and N is the total number of observations (or abn).
 SSA = Σ( TAi2 ) − G2  nb N
where:
• SSA is the Factor A sums of squares, (or nbΣ( i- )2),
• TAi is the sum of the observations at level i of factor A,
• b is the number of levels of factor B
 SSB = Σ( TBj2 ) − G2  na N
where:
• SSB is the Factor B sums of squares, (or naΣ( j- )2)
• TBj is the sum of the observations at level j of factor B.
 SSSubgroups = Σ( T(AB)ij2 ) − G2  n N
where:
• SSSubgroups is the treatment combination sums of squares,
• T(AB)ij is the sum of the observations in level i of factor A and level j of factor B.
 SSA × B = SSSubgroups - SSA - SSB
 SSError = SSTotal - SSA - SSB - SSA × B

#### Mixed effects model

In field work a more common application of factorial ANOVA is the mixed effects model. This is sometimes used to analyze data from a generalized randomized block desig, for example when completely randomized one factor experiments are repeated in multiple locations (= blocks). This approach is recommended by Underwood (1997) and others. But McKone & Lively (1993) & Shen (1995) argue that such data should instead be analyzed as treatment nested within block. The latter approach correctly deals with the separate randomization within each location, but unfortunately means that one cannot generalize about treatment over all blocks - which of course is usually what one wishes to do!

There appears to be no simple answer to this matter. We would tend to support the treatment nested within block approach, but have to admit that most recent texts seem quite happy to treat it as a fully replicated factorial design. (Obviously if one has only one replicate of each treatment in each block the issue does not arise - you have to analyze it as a randomized complete block with all the necessary assumptions.)

Factor A fixed, Factor B random

 Yijk  =  μ + αi Bj + (αB)ij + εijk
where:
• Yijk is the kth observation for factor Ai and factor Bj,
• μ is the population (grand) mean,
• αi is the fixed effect for the ith level of factor A,
• Bj is the random effect for the jth level of factor B,
• (αB)ij is the interaction effect between factors A and B,
• εijkis the random error effect.

 Source of variation df Expected MS Variance ratio 1. Factor A a-1 σ2 + nσ2αB + nbΣ α2/(α−1) MS1/MS3 2. Factor B b-1 σ2 + naσ2B MS2/MS4 3. A × B (a-1)(b-1) σ2 + nσ2αB MS3/MS4 4. Error N-ab σ2 Total variation N-1
where:
• a and b are the number of levels of factor A and B respectively,
• n is the number of replicates for each combination of factor A and factor B,
• N = Total number of observations,
• σ2 is the error variance,
• nbΣα2/(a-1) is the added treatment component for factor A,
• naσ2B is the added variance component for factor B,
• 2αB is the added interaction component.

The F-ratio for (fixed) factor A is obtained by dividing MSA by MSA×B. We have highlighted the MSA×B term as this is the main difference between the fixed and mixed effects models. The P-value for this F-ratio is obtained for a − 1 and (a− 1)(b− 1) degrees of freedom.

The F-ratio for (random) factor B is obtained by dividing MSB by MSError. The P-value for this F-ratio is obtained for b − 1 and ab(n − 1) degrees of freedom.

The F-ratio for the A × B interaction is obtained by dividing MSA×B by MSError. The P-value for this F-ratio is obtained for (a− 1)(b− 1) and ab(n − 1) degrees of freedom.

If there is little evidence for interaction (in other words if P > 0.25) , then it is permissible pool the interaction mean square with the error mean square to enable more powerful main effects tests of both A and B with mean square error as the denominator. But if there is any indication of interaction, the F-ratio test for factor A over the A × B interaction is a quasi F-ratio and provides only an approximate test of the main effect.

#### Factorial in randomized blocks

Treatment combinations can also be assigned using a randomized block experimental design. Since blocks are assumed to be a random factor, this makes the overall model a mixed effects model.

 Factorial design inrandomized blocks Blocks: I A2B2 A1B2 A2B1 A1B1 II A1B2 A2B1 A1B1 A2B2 III A1B1 A2B2 A1B2 A2B1

Two models are in common use which differ only in their assumptions about the existence or otherwise of interaction effects:

#### 'Model I'

 Yijk  =  μ + αi + βj + Sk + αβij + αSik + βSjk + [αβSijk] + εijk
where:
• Yijk is the observation in the kth block at the ith level of factor A and the jth level of factor B,
• μ is the population (grand) mean, and Sk is the random effect for the kth block,
• αi and βj are the fixed effects for the ith level of factor A and the jth level of factor B respectively,
• αΒij, αSij βSij are the interaction effects between factors A and B, A and S and B and S respectively,
• [αβSijk] is the confounded three way interaction effect which is assumed to be zero.
• εijkis the random error effect.

 Source of variation df Expected MS Variance ratio 1. Blocks (S) s-1 σ2 + abσ 2S 2. Factor A a-1 σ2 +nbσ2αS + nbΣ α2/(a-1) MS2/MS5 3. Factor B b-1 σ2 +naσ2βS + naΣ Β2/(b-1) MS3/MS6 4. A × B (a-1)(b-1) σ2 + sΣ (αβ)2/(a-1)(b-1) MS4/MS7 5. S × A (a-1)(b-1) σ2 +nbσ2αS 6. S × B (a-1)(b-1) σ2 +naσ2βS 7. Error N-ab σ2 Total variation N-1
where:
• a, b and s are the number of levels of factor A, B and S respectively,
• N = Total number of observations,
• σ2 is the error variance,
• nbΣα2/(a-1) is the added treatment component for factor A,
• naΣβ2/(b-1) is the added treatment component for factor B,
• nΣ (αβ)2/((a-1)(b-1)) is the added interaction component.

The F-ratio for (fixed) factor A is obtained by dividing MSA by MSS × A. The F-ratio for (fixed) factor B is obtained by dividing MSB by MSS × B. The F-ratio for the A × B interaction is obtained by dividing MSA×B by MSError. Note we do not have to assume there is no blocks × treatment interaction, but we do have to assume that there is no three-way interaction. If replication (number of blocks) is inadequate, the model will have low power to identify treatment effects.

#### 'Model II'

 Yijk  =  μ + αi + βj + Sk +  (αβ)ij + (αS)ik + (βS)jk + [(αβS)ijk] + εijk
where:
• Yijk is the observation at the ith level of factor A and the jth level of factor B in the kth block,
• μ is the population (grand) mean,
• Sk is the random effect for the kth block,
• αi is the fixed effect for the ith level of factor A,
• βj is the fixed effect for the jth level of factor B,
• (αB)ij, (αS)ik, (βS)jk are the two-way interaction effects which are all assumed to be zero,
• [(αβS)ijk] is the confounded three way interaction effect which is assumed to be zero.
• εijkis the random error effect.

 Source of variation df Expected MS Variance ratio 1. Blocks (S) s-1 σ2 + abσ 2S 1. Factor A a-1 σ2 + nbΣ α2/(a-1) MS1/MS4 2. Factor B b-1 σ2 + naΣ β2/(b-1) MS2/MS4 3. A × B (a-1)(b-1) σ2 + nΣ (αβ)2/(a-1)(b-1) MS3/MS4 4. Error N-ab σ2 Total variation N-1
where:
• a and b are the number of levels of factor A and B respectively,
• n is the number of replicates for each combination of factor A and factor B,
• N = Total number of observations,
• σ2 is the error variance,
• nbΣα2/(a-1) is the added treatment component for factor A,
• naΣβ2/(b-1) is the added treatment component for factor B,
• nΣ (αβ)2/((a-1)(b-1)) is the added interaction component.

The F-ratio for (fixed) factor A is obtained by dividing MSA by MSError. The F-ratio for (fixed) factor B is obtained by dividing MSB by MSError. The F-ratio for the A × B interaction is obtained by dividing MSA×B by MSError. Note we now have to assume there are no blocks × treatment interactions, even though we can estimate those interactions. You will have more power to identify treatment effects, but that power is bought at the cost of adopting the Nelson approach!

#### Nested cross-factored design

In this design a number (n) of evaluation units are nested in each of the the sampling or experimental units (factor C).

 Nested cross-factored design +++ (C1) +++ (C5) +++ (C9) +++ (C2) +++ (C6) +++ (C10) +++ (C3) +++ (C7) +++ (C11) +++ (C4) +++ (C8) +++ (C12)

In the diagram we have 12 experimental units (C1 - C12) which are allocated at random to treatment combinations: A1B1 A1B2 A2B1 A2B2 Observations are then made on each of three evaluation units for each experimental unit.

Factors A & B fixed, factor C random

 Yijkl  =  μ + αi + βj + αβij + Ck(ij) + εijkl
where:
• Yijk is the kth observation in subgroup j of group i, and μ is the population (grand) mean,
• αi and βj are the fixed effects for the ith and jth levels of factor A and factor B respectively, and (αB)ij is the interaction effect between factors A and B,
• Ck{ij} is the random effect for the kth unit of the ijth group. The notation Ck{ij} indicates that the effect of level Ck is nested within the treatment combination,
• εijk is the random error effect.

 Source of variation df Expected MS Variance ratio 1. Factor A a-1 σ2 + nσ 2C {αβ} + nbΣ α2/(a-1) MS1/MS4 2. Factor B b-1 σ2 + nσ 2C {αβ} + naΣ β2/(b-1) MS2/MS4 3. A × B (a-1)(b-1) σ2 + nσ 2C {αβ} + nΣ (αβ)2/(a-1)(b-1) MS3/MS4 4. C(A × B) (ab)(c-1) σ2 + nσ 2C {αβ} MS4/MS5 5. Error N-ab σ2 Total variation N-1
where:
• a and b are the number of levels of factor A and B respectively,
• c is the number of replicates for each combination of factor A and factor B,
• n is the number of evaluation units for each level of C and N is the total number of observations,
• σ2 is the error variance,
• 2C {αβ} is the replicates within treatment combinations variance component,
• nbΣα2/(a-1) is the added treatment component for factor A,
• naΣβ2/(b-1) is the added treatment component for factor B,
• nΣ (αβ)2/((a-1)(b-1)) is the added interaction component.

The F-ratios for factors A and B and the A × B interaction are obtained by dividing their respective mean squares by MSC(A ×B). This design is a common cause of massive pseudoreplication because the evaluation units are taken as the experimental (or sampling) unit. In other words the main effects and interaction are wrongly tested over the error mean square rather than the C(A × B) mean square. Unfortunately some authorities argue in favour of pooling the error mean square and the C(A × B) mean square, if the P-value for C (A × B) is greater than some prespecified value, say > 0.25. The effect of this recommendation is to make respectable those experiments where there is little or no genuine (independent) replication - simply because one has very low power for the test of C(A × B) if the number of true replicates (c) is very small. Hence we follow Hurlbert's point of view that this is simply pseudoreplication in another guise.   ### Assumptions

The same assumptions as for a one-factor ANOVA must also hold for factorial ANOVA, namely:

1. Random sampling (equal probability)
2. Independence of errors
Note especially that it is assumed that independent replicated measurements have been made on each combination of levels of the crossed factors.
3. Homogeneity of variances
4. Normal distribution of errors