InfluentialPoints.com
Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

 

 

Covariance Analysis (ANCOVA)

Principles  One way ANCOVA  Formulae  Assumptions 

Principles

Analysis of covariance (ANCOVA) combines the techniques of analysis of variance and regression by incorporating both nominal variables (factors) and continuous measurement variables (covariates) into a single model. We will only look in detail at its use in the completely randomized design, but it can be used with all the designs that we have covered including the randomized block, Latin square, factorial and repeated measures designs.

The primary use of ANCOVA is to increase precision in randomized experiments. A covariate X is measured on each experimental unit before treatment is applied. That covariate may be the baseline level of the response variable, or it may be some other characteristic of the experimental unit that is expected to affect outcome. The eventual treatment means are then adjusted to remove the initial differences, thus reducing the experimental error and permitting a more precise comparison among treatments. Such adjustment is dependent upon the parallel slopes assumption - namely that the slope of the relationship between the covariate and the response variable is the same for each treatment level.

The other main use of ANCOVA is to model relationships especially where one wants to compare regression relationships at different levels of some treatment variable - for example growth rates (size against age). In this use the aim is to assess which model is most appropriate to describe the data - whether separate slopes for each treatment level, a common slope but different intercepts or a common intercept but different slopes. This use clearly does not depend on the parallel slopes assumption - that is simply one of the model simplifications that can be made if justified - and because of this some authorities do not include such an analysis under the name 'analysis of covariance'.

A third somewhat controversial use of ANCOVA is to adjust for sources of bias in observational studies.

Models

  • One way ANCOVA

    Maximal model (separate regression lines)

    Yij  =  μ  +  αi  +  bi(Xij - i)  +  εij
    where:
    • Yij is the value for jth individual of group i,
    • μ is the population (grand) mean,
    • αi is the fixed treatment effect for group i ,
    • bi(Xij - ) is the effect explained by the difference of Xi,j from its group mean value (i) where bi is the slope of each regression line.
    • εij is a random deviation of the jth individual of group i from its expected value.

     

    Parallel lines model

    Yij  =  μ  +  αi  +  bc(Xij - )  +  εij
    where:
    • Yij is the value for jth individual of group i,
    • μ is the population (grand) mean,
    • αi is the fixed treatment effect for group i ,
    • bc(Xij - ) is the effect explained by the difference of Xi,j from the overall mean value of X () where bc is the slope of the common regression within groups.
    • εij is a random deviation of the jth individual of group i from its expected value.

     

    Computational formulae

    We will take a completely randomized experimental design with 'a' group (= treatment) levels, each replicated n times. A response variable Y and a covariate X are measured on each experimental unit. Group (treatment) totals are denoted as T1 to Ta, and the grand total as G.

    Step 1. Overall Pooled Regression

    The total, treatment, regression and residual sums of squares using a pooled overall regression are calculated as follows:

    SSTotal   =   ΣYij2 − (G)2/N
    where:

    • SSTotal is the total sums of squares, Yij are the individual observations, G is the grand total or ΣY, and N is the total number of observations.

    SSTreatment =Σ(Ti2)/n −G2/N
    where:
    • SSTreatment is the treatment sums of squares, Ti are the treatment totals, n is the number of replicates per treatment, G is the grand total or ΣY, and N is the total number of observations.

    SSRegression   =   [ΣXY − (ΣX) (ΣY)/N]2
    ΣXij2 − (ΣX)2/N
    where
    • SSRegression is the sums of squares explained by the regression, and N is the total number of observations.

    SSError   =   SSTotal − SSRegression
    where
    • SSError is the error or residual sums of squares

    Step 2. Individual regressions

    Regression statistics for each treatment level are calculated as follows.

    SSTotal   =   ΣY12 − (G1)2/n
    where:

    • SSTotal (1) is the total sums of squares, Y1 are the individual observations, G1 is the grand total (or ΣY1) and n is the number of observations for the first treatment level.

    SSXY (1)   =   ΣX1Y1 − (ΣX1)(ΣY1)/n
    where

    • SSXY (1) is the covariance sums of squares, and n is the number of observations for the first treatment level.

    SSX (1)   =   ΣX12− (ΣX1)2/n
    where

    • SSX (1) is the sums of squares for X1, and n is the number of observations for the first treatment level.

    SSRegression (1)   =   (SSXY (1))2/ SSX (1)
    where

    • SSRegression (1) is the regression sums of squares for the first treatment level.

    SSError (1)   =   SSTotal (1) − SSRegression (1)
    where

    • SSError (1) is the error or residual sums of squares for the first treatment level

    b1   =   SSXY (1) / SSX (1)
    where

    • b1 is the slope of the regression line for the first treatment level.

    Step 3. Summed regression statistics

    The regression statistics for each treatment level are summed as follows.

    SSTotal = SSTotal (1) + SSTotal (2) + ... + SSTotal (a)

    SSXY = SSXY (1) + SSXY (2) + ... + SSXY (a)

    SSX = SSX (1) + SSX (2) + ... + SSX (a)

    SSReg (summed) = SSReg (1) + SSReg (2) + ... + SSReg (a)

    SSError = SSError (1) + SSError (2) + ... + SSError (a)

     

    bcommon = SSXY / SSX

    SSReg (common slope) = (SSXY)2 / SSX.

    Step 4. Assess heterogeneity of slopes

    If slope of relationship between X and Y was the same at each treatment level, SSReg (common slope) would be the same as SSReg (summed). The difference between the two is a measure of the heterogeneity of slopes.

    Hence SSHeterogeneity of slopes = SSReg (summed) − SSReg (common slope)

    Mean squares are obtained by dividing sums of squares by their respective degrees of freedom. The significance test is carried out by dividing the mean square regression by the mean square error. Under a null hypothesis of a zero slope this F-ratio will be distributed as F with 1 and n − 2 degrees of freedom.

    Source of variation
    Regression
    Error
    Total
    df
    1
    n − 2
    n − 1
    SS
    SSReg
    SSError
    SSTotal
    MS
    MSReg (s2)
    MSError (s2Y.X)
     
    F-ratio
    MSReg / MSError
     
     
    P value
     
     
     

    Adjusted means

    ' = 1 − b ( )

     

    Dealing with non-parallel lines

    Pick-a-point approach
      For example, if we have treatment (A) with two levels and a continuous covariate (B)
      Effect size (b1|B=θ) = b1 + b3θ
      where b1 is the coefficient for (A), b3 is the coefficient for the interaction (A × B), and θ is the chosen value of the covariate (B). Then estimate the standard error of effect size:

      Standard error of effect size (sb1|B=θ) = √ s2b1 + 2 θs2b1b3 + θ2s2b3

     

    Assumptions

    1. ANOVA assumptions
      Observations are independent from observation to observation. Residuals are randomly and normally distributed. Variances between groups are homogeneous (ANOVA assumptions).

    2. Regression assumptions
      The relationship between Y and X must be linear for each treatment group (although some forms of nonlinearlity can be dealt with by including a polynomial term as an extra covariate). In addition errors (deviations from the fitted lines) must be independent of the values of X and normally distributed.

    3. Specific ANCOVA assumptions
      The model assumes that the covariate is independent of the treatment effect. In other words the distribution of values of the confounder should be the same at each treatment level or (more importantly) the (parametric) mean value of the covariate is the same for each group. A further specific (but optional) assumption is homogeneity of slopes. It is optional because it is only required to simplify the model for estimation of adjusted means.
  •