Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

Search this site



Purpose and Assumptions

Transforming a variable re-scales it. For instance, applying a transforming function F (also known as a mapping function) to variable Y produces Y', so Y' = F(Y). A transformation can be any mathematical operation applied to data. A de-transformation reverses, or inverts that process. Although an infinite variety of transformations are possible, the most important transformations are applied to all values.

Those that do not alter the ranks of the data are known as monotonic transformations. If the order is maintained, it is termed an ascending transformation; if it is reversed (as in a reciprocal transformation) it is a descending transformation. The transformation is linear if plotting the transformed data against the untransformed data produces a straight line. Linear transformations are mainly used to ease data handling or display. You might, for example, add a fixed amount to make all your measurements positive, or divide by 1000 to save space on a graph. Such linear transformations are sometimes also known as coding.

Non-linear transformations are defined as those mathematical operations that do not give a straight line when the transformed data are plotted against the original data. They are the topic of this More Information page. They are of special value in data analysis for several purposes:

  1. To normalize distributions of data and their statistics

  2. To equalize variances of different groups of observations

  3. To produce an additive model for multiplicative biological processes.

  4. To provide a more appropriate measure of central location

  5. To linearize a curvilinear relationship between two variables.

We will focus here on the first three of these uses - the use of transformations to provide a more appropriate measure of central location has been covered in Unit 1, and their use to linearize relationships is revisited in Unit 12.

Note also that, with the advent of generalized linear models (see Unit 14) where we can deal with non-normal error distributions, it may not be necessary (or even desirable) to transform a variable to meet the objectives listed above.



Choice of transformation

Given the wide range of uses of transformations, one might expect that it would be difficult to come up with a single criterion by which to select a transformation for a given set of data. In some instances this is true, but sometimes a transformation that achieves one of the objectives above, will also achieve the others. In particular transformations that normalize a distribution may also tend to homogenize variances.

There are two ways to obtain a precise power transformation for a set of data of the form Y' = Yx. If the data are in groups, and the mean and variance has been estimated for each group, we can use the relationship between the means and variances of the groups as a rough measure of the best transformation to use for homogenizing variances. This can be quantified using Taylor's Power Law - in other words by plotting the log variance against log mean, and estimating the slope (b) of the line. An approximate variance stabilizing transformation is then given by the following power relationship:

Algebraically speaking -

Y'   =   Y[1 − (b/2)]
  • Y' is the transformed value
  • Y is the original untransformed value
  • b is the slope of the regression of log against log s2

If the data are not arranged in groups, then the Box-Cox transformation, detailed in the related topic, can be used to obtain a precise power transformation for the data at hand. The objective for the Box-Cox transformation is usually to select the best normalizing transformation, although one can add the requirement of variance stabilization as well.

Use of a precise power transformation has its advantages, but is seldom done simply because it is hard to justify on biological grounds. It is more usual to use one of the approximate transformations given in the table below.

Appropriate transformations for given values of 1 − (b/2)
 b 1 − b/2 Transformation Y'
− 0.5
− 1.0
no transform
square root
reciprocal square root
 Y − 0.5
 Y − 1

  • If b ⇒ 0, there is no relationship between the variance and mean so no transformation is required.
  • If b ≈ 1, the variance is directly proportional to the mean, and a square root transformation is used. This may sometimes (but not always!) be found with small whole numbered counts, where the distribution approximates to the Poisson.
  • If b ≈ 2, the variance is proportional to the mean squared and a logarithmic transformation is more appropriate. This is widely used for a whole range of situations, such as parasite counts and antibody titres, where the distribution approximates to log-normal.
  • If b ≈ 4, the variance is proportional to the mean to the fourth power (b=4), a reciprocal transformation can be used. This is sometimes the case for time periods.

Proportions are not covered by this family of power transformations. We will cover one of the transformation used for proportions here - namely the arcsine square root transformation. However, we will keep some other special transformations for proportions - namely the probit and logistic transformations - until Unit 14 since they are mainly used for regression analysis.



Reporting analyses on transformed data

Many researchers (and some text book writers) have problems on how analyses on transformed data should be reported.

  1. All analysis should be conducted on the transformed data. It is quite permissible to then report means, standard deviations, standard errors and confidence intervals in the transformed scale. However, such transformed data may not very comprehensible to your readers.
  2. In order to present the data in the original scale of measurement, the means and confidence intervals should be detransformed (or back-transformed) using the appropriate arithmetic operations in reverse (as detailed below). Standard errors and standard deviations should not be detransformed as they are meaningless as detransformed statistics.
  3. You should definitely not report the results of your analysis using the means and associated statistics derived from the raw or untransformed data. This is because your analysis refers only to the data in the transformed scale. Some authorities recommend reporting untransformed means and standard deviations as well as the detransformed values purely for information, but this is generally not good practice as it can lead to endless confusion!
  4. Further complications arise if you are estimating the confidence interval of the treatment effect expressed as a difference or a ratio. Only the log transformation is useful here, as the detransformed limits after a square root or reciprocal transformation do not make sense (squaring negative numbers makes them positive).



Some common transformations

  • The square root transformation

    This transformation is used for when the variance is directly proportional to the mean, so that a plot of the log variance against the log mean gives a slope of one. A square root transformation of the original data will generally render the variance independent of the mean. This will apply if the data follow a Poisson distribution (where the variance equals the mean) and expected frequency is greater than 5. It will also apply if the variance is greater than the mean, providing the two are proportional. If there are any small numbers (<10) and especially if there are any zeros, a half or 3/8 should be added before taking the square root:

    Algebraically speaking -

    Y'   =   
    Y + 0.5
    Y'   =   
    Y + 3/8

    For very small whole numbers the following is recommended:

    Y'   =    +
    Y Y+1
    • Y' is the transformed value
    • Y is the original untransformed value

    The mean and associated confidence intervals can be obtained in the detransformed scale by simply reversing the arithmetic operations - although a correction factor should be added to reduce bias:

    Algebraically speaking -

    To obtain the detransformed mean after a (Y+0.5) transformation:

    S   =   (')2 − 0.5 + s'2 (1   −   1)
    • ' is the mean of the square root transformed data,
    • S is the detransformed mean,
    • s'2 is the variance of the square root transformed data,
    • n is the sample size.

    The square root transformation has been criticized by Hurlbert & Lombardi (2003) on the grounds that count data rarely conform to a Poisson distribution, and because they can produce (illogical) negative lower confidence limits for counts that are not possible with a log transformation. On the first point, the transformation is not only valid for a Poisson distribution - it can be used for any distribution providing the slope of the log variance versus log mean plot is close to 1. The second point is, however, a serious constraint to its use - especially if one wishes to attach a confidence interval to a treatment effect.

    One can also criticize the square root transformation on the grounds that it does not provide a simple model for the way in which explanatory variables affect the response variable. If data are untransformed, it is assumed the effects are additive - if they are log transformed, it is assumed they are multiplicative. But there is no such simple model for the square root transformation.


  • The logarithmic transformation

    The logarithmic transformation is the most frequently used transformation in biology. This is partly because the variance frequently is proportional to the mean squared for biological data so the transformation is optimal on this basis. In addition negative values are impossible, making it ideal for count data. We have already given many examples in the core text of the effectiveness of the log transformation for normalizing distributions and homogenizing variances. One need only consider this histogram of daily trap catches of tsetse:

{Fig. 1}

    The log transformed data now appear to follow a normal distribution. This of course is why the geometric mean provides a better measure of central location for data with a skewed distribution.

    If there are no zeros, the transformation is achieved by simply taking the logarithm of each observation. If zeros are present, then the usual solution is to add 1 as a constant to each observation:

    Algebraically speaking -

    Y'   =   log (Y + 1)
    • Y' is the transformed value
    • Y is the original untransformed value

    An alternative (possibly) better approach is to choose the constant (c) that minimizes the sum (G) of skewness (g1) and kurtosis (g2) of the data. This can be done by estimating G for a range of values of c from about 0.1 to 2.0, and then plotting them against c. The value of c which gives the minimum value is then estimated from the graph. This approach may serve to improve the normality of the distribution, but it will do nothing to remove the bias which results if a zero indicates a count below your ability to detect it, rather than a true zero. A better solution may be to use a jittering technique to add a small randomly selected quantity between 0 and 1 to each observation.

    The mean and associated confidence intervals can be obtained in the detransformed scale by reversing the arithmetic operations

    Algebraically speaking -

    To obtain the detransformed (geometric) mean after a log(Y + 1) transformation:

    G   =   (antilog ') − 1
    • ' is the mean of the transformed values;
    • G is the detransformed (geometric) mean.

    Another very good reason for using a log transformation (rather than a square root or precise power transformation) is that transforming to a log scale means that you assume factors work in a multiplicative rather than additive manner - a more logical assumption in many situations.

    We will demonstrate this with an example from using traps to sample Stomoxys flies. The number of flies 'available' for capture varies both from day to day and between sites. In other words there is both a 'day' effect and a 'site' effect. These day and site effects are usually multiplicative - so site two might tend to have twice as many flies as site one on any particular day, rather than 20 more or 50 more.

    Let's make the following assumptions:

    1. There are 100 flies in position 1 on day I,
    2. There are twice as many on day II as on day I,
    3. There are three times as many in Position 2 as in Position 1.

    We have filled in these numbers in the table.

    Number of flies available
      1    100 200
      2   300 600

    Now let's say we want to compare a new trap design (A2) with our standard design (A1) to see which catches the largest number of flies.

    A suitable design for this experiment would be a 'crossover' design, such as that shown here. It would, of course, have to be replicated over a number of pairs of days or pairs of positions but, for simplicity, we shall consider just one replicate.

    Changes in trap location
      1    A1 A2
      2   A2 A1

    Let us assume that A1 catches 10% of the available flies, whilst A2 catches 20% of the available flies. These are now the catches we would expect:

    [Traps] and catches
      1    [A1] 10 [A2] 40
      2   [A2] 60 [A1] 60

    If we take the arithmetic means of the catches we get A1 = 35 and A2 = 50. In other words A2 is apparently only 1.43 times better than A1, despite the fact that we know it is twice as good!

    If however we log transform our data, we get detransformed means of 24.5 for A1 and 49.0 for A2. In other words the geometric means correctly indicate that Trap A2 catches twice as many flies as Trap A1. In this situation using arithmetic means would quite simply give us the wrong answer - as for that matter would the square root transformation.

    The log transformation is also appropriate if you want to compare levels of variability over time, either within the same species or between different species. In either case we will be looking at variation around quite different mean levels, so it is the proportional variation we are interested in, not the absolute variation.


  • The reciprocal transformation

    The two reciprocal transformations 1/Y and 1/Y are much less commonly used than the other transformations covered here. They are appropriate for data that are very right skewed, where the variance is proportional to the mean cubed or to the fourth power. Such data may include time periods and rates.

    If there are any zeros, 1 should be added before taking the reciprocal:

    Algebraically speaking -

    Y'   =    1
    Y + 1

    If you wish to retain the original ordering of the variable use:
    Y'   =   −  1
    Y + 1

    • Y' is the transformed value
    • Y is the original untransformed value


  • The arcsine transformation

    The variance of a proportion shows a specific curvilinear relationship to the mean, with a maximum at P = 0.5. For proportions between 0.3 and 0.7 the variance only varies between fairly narrow limits, and it is generally considered that such data can be analysed untransformed. If some of the proportions are outside this range, then the data should be subjected to an arcsine (also known as angular) transformation.

    Algebraically speaking -

    Y'   =   arcsin √Y
    • Y' is the transformed value
    • arcsin (or sin − 1, but not 1/sin) is either measured in degrees or radians (1 radian = 180/π ≅ 57.3 degrees)
    • Y is the original untransformed value

    We can see how effective this transformation is at stabilizing the variance in the graph below. This shows the variance of angular transformed proportions for n=1000, 100 and 30 observations.

{Fig. 2}

    The transformation is effective at stabilizing the variance to 1/4n radians providing PQn is more than about 5. Below this the variance becomes erratic.

    Below are the exact distributions of angular transformed sample proportions for a series of population proportions, estimated using the binomial expansion.

{Fig. 3}

    The transformation is not very effective at normalizing the distribution where PQn≤ 5. This is because there are insufficient classes to 'describe' the shape of these highly skewed distributions. Transforming these distributions is ineffective because all the information about the most extreme classes is lost by being pooled within a single class. Larger samples suffer correspondingly less, because the number of available classes is directly proportional to the sample size (i.e. n+1 classes per sample).

    If your data include many proportions where pqn is less than five, there are modifications to the arcsin transformation that enable it to perform somewhat better. These include:

    Algebraically speaking -

    (1)  p'   =   1 { arcsin√ [ f ] + arcsin√[ f + 1 ]}
    2 n + 1 n + 1

    (2)  p'   =   arcsin
     f + 3/8
    n + 3/4

    (3)  p'   =   2√n [ arcsin   −  arcsin√ p ]
     f + 3/8
    n + 3/4

    • p' is the transformed proportion,
    • f is the number of individuals with the character of interest,
    • n is the sample size,
    • p is the proportion with the character of interest.

    However, for such data, it may be better to use Monte Carlo methods rather than attempting to use parametric analyses.

    Although the arcsine transformation was heavily used for analyzing proportions in the past, it is now used much less, and many would consider it is largely obsolete. This is because of the availability of logistic regression, a form of the generalized linear model that is designed to work with data in the form of counts (Y out of a total of N) rather than with the proportions.

Related topics :

Box-Cox transformation

Log normal confidence intervals