Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)




Testing homogeneity

"The officer and the office, the doer and the thing done, seldom fit so exactly that we can say they were almost made for each other."
Sydney Smith 1771-1845.
Sketches of Moral Philosophy (1850). (This is generally accepted as the origin of the phrase: A square peg in a round hole)

Testing homogeneity

Up till now, both for the 22 tables we looked at in Unit 9, and for the goodness of fit tests given above, we have assumed that we are dealing with a single sample or trial result. In practice of course life is not usually like that. Mendel's data on peas resulted from the pooling of results from many different trials. If we are carrying out an experiment we usually need to replicate it several times.

So can we just pool the results and test the final frequency distribution?

The simple answer to this question is ... no! The point to remember here is that variability between replicates is not just an annoying feature of data that should be buried as soon as possible in the search to find support for one's hypothesis. As we have shown in some of the examples, variability may indeed tell us more about what is really happening than the 'average' result. Nevertheless we can expect a certain random variability - as much in goodness of fit to an expected distribution among different replicates as in anything else. So how do we distinguish the random variability from 'real' differences between our replicates?

Fortunately we can also use the chi square or G-test to first assess whether we can accept such variability as random and pool the data, or whether such variability is too great and pooling is unjustified.

Let's return to our example with the skunks. Say we surveyed the area for denning sites over five breeding seasons. There were no significant changes in land use over the period so our expected distribution remains unchanged over the period. But are our data homogenous? In other words does the proportion of the total number of sites in each habitat remain roughly constant from season to season?

{Fig. 1}

Observed frequencies of denning sites of skunks over five breeding seasons
Breeding season Wet-lands (25%) Farm-steads (2%) Bird nesting areas (26%) Wood-land (33%) Others (14%) G
 Ghomogeneity (df=16) =   24.8   P= 0.073Gtotal (df=20)
= 599.8
 Gpooled (df=4) = 575.0     P < 0.001
The test of homogeneity is best viewed as a test of association between breeding season (here taken as replicate) and habitat type. We therefore calculate G for the 55 table (in yellow). The margin totals are used to estimate the expected frequencies assuming no association. The P-value we obtain just fails to reach the 0.05 level of significance so (rather hesitantly!) we accept the data are homogenous.

We then test the goodness of fit of the pooled frequencies (in blue) to the expected frequencies. Here the relative frequency of the habitat types is used to estimate the expected frequencies. As before we find there is a highly significant deviation of observed from expected.

We can also work out the G values (in pink) for goodness of fit of individual seasons to expected frequencies assuming random selection of site. If we add these individual G-values together to give Gtotal we get 599.8. You can now see a very important feature of G-values, namely they are precisely additive.


Ghomogeneity + Gpooled = Gtotal

X2homogeneity (df=16)
=   24.5    P= 0.078
X2total = 358.2 (1) + 488.3 (2)
           + 112.1 (3) + 518.2 (4)
     + 547.0 (5) = 2023.8
X2pooled (df=4)
  = 1960.7   P < 0.001

We could have used X2 as our statistic rather than G for this combined test of homogeneity and goodness of fit.

But, as you can see from the table, the values are no longer precisely additive:

X2pooled + X2homogeneity = 1985.2; X2total = 2023.8.

Nevertheless many (but not all) statisticians prefer to use Pearson's chi square on the basis that the approximation of X2 to χ2 is better than G when expected frequencies are small.

What we have done here is known as partitioning the G-values in a contingency table, as discussed in Unit 9.