"The officer and the office, the doer and the thing done, seldom fit so exactly that we can say they were almost made for each other."
Sydney Smith 17711845.
Sketches of Moral Philosophy (1850). (This is generally accepted as the origin of the phrase: A square peg in a round hole)

Testing homogeneity
Up till now, both for the 2×2 tables we looked at in Unit 9, and for the goodness of fit tests given above, we have assumed that we are dealing with a single sample or trial result. In practice of course life is not usually like that. Mendel's data on peas resulted from the pooling of results from many different trials. If we are carrying out an experiment we usually need to replicate it several times.
So can we just pool the results and test the final frequency distribution?
The simple answer to this question is ... no! The point to remember here is that variability between replicates is not just an annoying feature of data that should be buried as soon as possible in the search to find support for one's hypothesis. As we have shown in some of the examples, variability may indeed tell us more about what is really happening than the 'average' result. Nevertheless we can expect a certain random variability  as much in goodness of fit to an expected distribution among different replicates as in anything else. So how do we distinguish the random variability from 'real' differences between our replicates?
Fortunately we can also use the chi square or Gtest to first assess whether we can accept such variability as random and pool the data, or whether such variability is too great and pooling is unjustified.
Let's return to our example with the skunks. Say we surveyed the area for denning sites over five breeding seasons. There were no significant changes in land use over the period so our expected distribution remains unchanged over the period. But are our data homogenous? In other words does the proportion of the total number of sites in each habitat remain roughly constant from season to season?
{Fig. 1}
Observed frequencies of denning sites of skunks over five breeding seasons

Breeding season
 Wetlands (25%)
 Farmsteads (2%)
 Bird nesting areas (26%)
 Woodland (33%)
 Others (14%)
 G

1  13  19  4  8  3  92.6

2  20  37  8  56  9  171.2

3  5  7  2  4  2  30.9

4  29  34  5  28  7  159.8

5  21  29  5  14  3  145.3

 G_{homogeneity (df=16)} = 24.8 P= 0.073  G_{total (df=20)} = 599.8

Pooled  88  126  24  110  24

 G_{pooled (df=4)} = 575.0 P < 0.001
  
The test of homogeneity is best viewed as a test of association between breeding season (here taken as replicate) and habitat type. We therefore calculate G for the 5×5 table (in yellow). The margin totals are used to estimate the expected frequencies assuming no association. The Pvalue we obtain just fails to reach the 0.05 level of significance so (rather hesitantly!) we accept the data are homogenous.
We then test the goodness of fit of the pooled frequencies (in blue) to the expected frequencies. Here the relative frequency of the habitat types is used to estimate the expected frequencies. As before we find there is a highly significant deviation of observed from expected.
We can also work out the G values (in pink) for goodness of fit of individual seasons to expected frequencies assuming random selection of site. If we add these individual Gvalues together to give G_{total} we get 599.8. You can now see a very important feature of Gvalues, namely they are precisely additive.
Hence:
G_{homogeneity} + G_{pooled} = G_{total}
We could have used X^{2} as our statistic rather than G for this combined test of homogeneity and goodness of fit.
X^{2}_{homogeneity (df=16)} = 24.5 P= 0.078
 X^{2}_{total} = 358.2 (1) + 488.3 (2) + 112.1 (3) + 518.2 (4) + 547.0 (5) = 2023.8

X^{2}_{pooled (df=4)} = 1960.7 P < 0.001
  
But, as you can see from the table, the values are no longer precisely additive:
X^{2}_{pooled} + X^{2}_{homogeneity} = 1985.2; X^{2}_{total} = 2023.8.
Nevertheless many (but not all) statisticians prefer to use Pearson's chi square on the basis that the approximation of X^{2} to χ^{2} is better than G when expected frequencies are small.
What we have done here is known as partitioning the Gvalues in a contingency table, as discussed in Unit 9.