Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
The square root transformation
Where counts are large, the usual transformation is the log transformation. However, you should never assume that a particular transformation is suitable - always examine the data first! Our worked example is taken from one that has been quoted in many statistical texts in the past - an experiment on weed control in
The experiment was arranged in a randomised block design with one of each treatment per block. Treatment A was the control.
Examination of the standard deviation for each treatment shows that the standard deviation tends to increase with the mean. Hence we may be able to use Taylor's Power Law to select the best transformation.
The slope (b) is 0.7466. The most appropriate power transformation is then obtained from:
This is close to a square root transformation (Y' = Y0.5) so in the adjacent table we have square root transformed the data, and recalculated means (') and standard deviations (s'). The standard deviations are now more similar between the treatments. The detransformed means (D) are also given.
You may wonder what transformation would be suggested by a Box-Cox transformation. When there is a strong treatment effect (as here), you cannot just pool all the data to find the appropriate power transform. Instead you need to first model the relationship using the general linear model as shown in
This is a log-likelihood plot for the Box-Cox transformation of the weed data. In this case the maximum likelihood estimate of λ is 0.4646, but the confidence interval encloses 0.5 so a square root transformation would be perfectly acceptable.
The log transformation
Our worked example for a log transformation is taken from some of our own research on optimizing trap design for tsetse flies.
We used a Latin square design in order to control for the effects of environmental factors - namely site and day. For each replicate, each of the four different designs was rotated around four sites over four days to give the balanced design shown below. The different trap designs are colour coded. Pink denotes the control (a) (the standard NGU trap) and green (b), yellow (c) and blue (d) denote three different modifications to the basic design.
A brief look at the data (comprised of small whole numbers) might suggest a square root transform would be best. Clearly the standard deviation is not independent of the mean:
The slope (b) is 1.5873. The most appropriate power transformation is then obtained from:
This is close to a zero so a log transformation (Y' = log Y) would probably be appropriate. Moreover, a multiplicative model is the most appropriate for the way in which site and day affect the catch - in other words a particular site tended to be (say) twice as good as another site, rather than always catching a (say) 50 more flies. The only other problem is that there were a few zero catches - in this case it was considered acceptable to use a log (Y + 1) transformation, despite the risk of bias in adding one to each data point.
In the transformed scale the standard deviations are much more similar for all trap types - although the standard deviation for trap type d (blue) is still somewhat lower. Note also that geometric mean catches in trap types (b) (green) and (c) (yellow) are now similar to the control (a) (pink) - the arithmetic mean catch in (a) was unduly inflated by an unusually high catch of 61 flies.