Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)




Box-Cox transformation

The Box-Cox transformation is a procedure for obtaining the optimal transformation to normalize data within the following family of power transformations:
Y'    =    [Yλ −1] / λ    when λ ≠ 0
Y'    =    ln Y    when λ = 0

The required value of λ is given by that value which maximizes the following log likelihood function

Algebraically speaking -

L    =   −  v   ln sT2  +  (λ − 1)   v   Σ (ln Yi
  • L is the log likelihood,
  • v is the degrees of freedom (number of observations − 1),
  • sT2 is the variance of the values which have been transformed using [Yλ − 1] / λ,
  • λ is the current estimate of the parameter,
  • Yi are the original data values

This equation is solved iteratively using a series of values of λ. Values of the log likelihood function are then plotted against λ to obtain the maximum.

Where the data includes zeros, a constant is added to each value of Y - usually either 1 or 0.5. As with other transformations, if there are many zeros this can result in bias.

Detransformation is achieved using the following:
Y    =    [Y'λ + 1] 1/λ    when λ ≠ 0
Y    =    exp Y'    when λ = 0

Although you can use the precise value of λ in the transformation, it is more common to use the (common) transformation closest to that suggested by the Box-Cox transformation, providing it still lies within the 95% confidence interval of λ. This is known as a 'convenient estimator' (although this model can be impossible to interpret in biological terms).

  • A λ of 0 is a log transformation.
  • A λ of 0.5 is equivalent to (but not identical to) a square root transformation (provided Y > 1) - then again a square root transformation cannot cope with negative numbers.
  • A λ of − 1 is equivalent to a reciprocal transformation.