The square root transformation
Where counts are large, the usual transformation is the log transformation. However, you should never assume that a particular transformation is suitable  always examine the data first! Our worked example is taken from one that has been quoted in many statistical texts in the past  an experiment on weed control in cereals.
Worked example
Number of weeds in cereal plots 
Blocks 
Treatment 
A  B  C  D  E  F 
I II III IV
s

438 442 319 380 395 57.9 
538 422 377 315 413 94.2 
77 61 157 52 86.8 48.0 
17 31 87 16 37.8 33.5 
18 26 77 20 35.3 28.0 
115 57 100 45 79.3 33.5 
The experiment was arranged in a randomised block design with one of each treatment per block. Treatment A was the control. Examination of the standard deviation for each treatment shows that the standard deviation tends to increase with the mean. Hence we may be able to use Taylor's Power Law to select the best transformation. The figure below shows a plot of log variance against log mean for each of the treatments.

{Fig. 4}
Number of weeds in cereal plots (square root transformed) 

Treatment 
A  B  C  D  E  F 
' s' D 
19.8 1.49 392 
20.2 2.29 408 
9.1 2.39 82.8 
5.8 2.49 33.6 
5.6 2.12 31.4 
8.7 1.92 75.7 
The slope (b) is 0.7466. The most appropriate power transformation is then obtained from:
Y' = Y^{[1 − (b/2)]} = Y^{0.6267}
This is close to a square root transformation (Y' = Y^{0.5}) so in the adjacent table we have square root transformed the data, and recalculated means (') and standard deviations (s'). The standard deviations are now more similar between the treatments. The detransformed means (D) are also given.

You may wonder what transformation would be suggested by a BoxCox transformation. When there is a strong treatment effect (as here), you cannot just pool all the data to find the appropriate power transform. Instead you need to first model the relationship using the general linear model as shown in Unit 11. You can then use R to find the optimal BoxCox transformation.
{Fig. 5}
This is a loglikelihood plot for the BoxCox transformation of the weed data. In this case the maximum likelihood estimate of λ is 0.4646, but the confidence interval encloses 0.5 so a square root transformation would be perfectly acceptable.


The log transformation
Our worked example for a log transformation is taken from some of our own research on optimizing trap design for tsetse flies.
Worked example
Number of tsetse flies caught in 4 different trap types 
Areas (G)  Positions  Periods (B) 
I 
II 
III 
IV 
1  I 
5 
9 
0 
4 
II 
0 
4 
0 
1 
III 
0 
1 
6 
3 
IV 
5 
4 
4 
6 

2  V 
3 
5 
10 
5 
VI 
4 
2 
8 
6 
VII 
17 
11 
15 
29 
VIII 
14 
5 
20 
4 

3  IX 
10 
29 
61 
26 
X 
17 
12 
17 
13 
XI 
14 
9 
6 
7 
XII 
10 
11 
14 
8 
We used a Latin square design in order to control for the effects of environmental factors  namely site and day. For each replicate, each of the four different designs was rotated around four sites over four days to give the balanced design shown below. The different trap designs are colour coded. Pink denotes the control (a) (the standard NGU trap) and green (b), yellow (c) and blue (d) denote three different modifications to the basic design.
A brief look at the data (comprised of small whole numbers) might suggest a square root transform would be best. Clearly the standard deviation is not independent of the mean:
Arithmetic means: 
14.3 
9.4 
10.3 
5.4 
Standard deviation: 
16.5 
7.8 
7.8 
4.9 

The figure below shows a plot of log variance against log mean for each of the treatment/area combinations.
{Fig. 6}
The slope (b) is 1.5873. The most appropriate power transformation is then obtained from:
Y' = Y^{[1 − (b/2)]} = Y^{0.2064}
This is close to a zero so a log transformation (Y' = log Y) would probably be appropriate. Moreover, a multiplicative model is the most appropriate for the way in which site and day affect the catch  in other words a particular site tended to be (say) twice as good as another site, rather than always catching a (say) 50 more flies. The only other problem is that there were a few zero catches  in this case it was considered acceptable to use a log (Y + 1) transformation, despite the risk of bias in adding one to each data point.
Log (Y+1) transformed mean 
0.997 
0.873 
0.977 
0.661 
Log(Y+1) transformed SD 
0.448 
0.416 
0.259 
0.405 
Detransformed (geometric) means 
8.9 
6.5 
8.5 
3.6 
In the transformed scale the standard deviations are much more similar for all trap types  although the standard deviation for trap type d (blue) is still somewhat lower. Note also that geometric mean catches in trap types (b) (green) and (c) (yellow) are now similar to the control (a) (pink)  the arithmetic mean catch in (a) was unduly inflated by an unusually high catch of 61 flies.

