Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
The G-test of independence is a likelihood ratio test which tests the goodness of fit of observed frequencies to their expected frequencies if row and column classifications were independent. The method is based on the multinomial distribution where both row and column totals are random, not fixed.
Two likelihoods are estimated - the likelihood of the observed frequencies under a multinomial distribution, and the likelihood if it is assumed that row and column classifications are independent. Twice the natural logarithm of this ratio is equal to G which is approximately distributed as χ2 with one degree of freedom.
This formula can also be used for goodness of fit tests and for contingency tables with more than two rows or columns.
As with Pearson's chi square test, there is an alternative computational formula for 2 x 2 contingency tables that is preferred as it is not subject to rounding errors (although it is rather more complicated!):
Williams Correction for a 2 × 2 table
William's correction is the preferred continuity correction for the G likelihood ratio test - although it cannot be used if there are any zeros in the table. It is not as conservative as Yates' correction.
Yates' Correction for a 2 × 2 table
Yates correction can also be used with the G test, as with Pearson's chi squared, although again it tends to be too conservative. As well as correcting for continuity, it also resolves the problem if one of the observed frequencies is zero. If (ad-bc) is positive, subtract 0.5 from a and d and add 0.5 to b and c. If (ad-bc) is negative, add 0.5 to a and d and subtract 0.5 from b and c. For this table (ad-bc) is positive. The Yates corrected G-statistic in then calculated in the normal way.
How to do it
We will first take the same example of cross-sectional design that we used for Pearson's chi square. A random sample of 2000 men aged 18-25 is taken and each individual is classified as married/single and HIV positive/negative. You wish to determine if there is an association between marital status and infection status.
None of the expected frequencies is less than 5, so the continuity correction is not used.
This value of G is referred to the probability calculator on your software package, or to tables of χ2 for 1 degree of freedom. It is significant at P = 0.0001.
You can therefore conclude that a significantly higher proportion of young single men are positive for HIV (0.0949) than of married men (0.0497) (P = 0.0001).