Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)




The G likelihood-ratio test

The G-test of independence is a likelihood ratio test which tests the goodness of fit of observed frequencies to their expected frequencies if row and column classifications were independent. The method is based on the multinomial distribution where both row and column totals are random, not fixed.

Two likelihoods are estimated - the likelihood of the observed frequencies under a multinomial distribution, and the likelihood if it is assumed that row and column classifications are independent. Twice the natural logarithm of this ratio is equal to G which is approximately distributed as χ2 with one degree of freedom.

Algebraically speaking -

G   = 2 Σ fi  ln (  fi  )


  • G is the likelihood ratio statistic, approximating to χ2 for large samples
  • fi and i are the observed and expected frequencies for each cell (where i=1 to 4, or a to d). Expected frequencies are calculated in the same way as for Pearson's chi square test.

This formula can also be used for goodness of fit tests and for contingency tables with more than two rows or columns.

As with Pearson's chi square test, there is an alternative computational formula for 2 x 2 contingency tables that is preferred as it is not subject to rounding errors (although it is rather more complicated!):

Algebraically speaking -

G    =   2 ln [ aabbccddNN ]
(a + b)a+b(a + c)a+c(b + d)b+d(c + d)c+d
    =   2 [a ln(a) + b ln(b) + c ln(c) + d ln(d)
        + N ln(N)− (a+b) ln(a+b) − (a+c) ln(a+c)
       − (b+d) ln(b+d) − (c+d) ln(c+d)]


  • G is the likelihood ratio statistic, approximating to χ2 for large samples,
  • a, b, c, & d are the observed frequencies for each cell,
  • N is the total number of observations.



Continuity corrections

Williams Correction for a 2 2 table

William's correction is the preferred continuity correction for the G likelihood ratio test - although it cannot be used if there are any zeros in the table. It is not as conservative as Yates' correction.

Algebraically speaking -

q    =  1 + {N/(a+b)+N/(c+d)-1}{N/(a+c)+N/(b+d)-1}
Corrected G statistic (Gc) =   G/q


  • q is William's correction,
  • all other symbols are as above

Yates' Correction for a 2 2 table

Yates correction can also be used with the G test, as with Pearson's chi squared, although again it tends to be too conservative. As well as correcting for continuity, it also resolves the problem if one of the observed frequencies is zero. If (ad-bc) is positive, subtract 0.5 from a and d and add 0.5 to b and c. If (ad-bc) is negative, add 0.5 to a and d and subtract 0.5 from b and c. For this table (ad-bc) is positive. The Yates corrected G-statistic in then calculated in the normal way.



How to do it

Worked example

We will first take the same example of cross-sectional design that we used for Pearson's chi square. A random sample of 2000 men aged 18-25 is taken and each individual is classified as married/single and HIV positive/negative. You wish to determine if there is an association between marital status and infection status.

Marital statusHIV infection statusTotals
Single 58 553 611
Married69 1320 1389

None of the expected frequencies is less than 5, so the continuity correction is not used.

Applying the appropriate formula to calculate the G statistic:
G  =  2[58 ln(58) + 553 ln(553) + 69 ln(69) + 1320 ln(1320)
   + 2000 ln(2000) − (611) ln(611) − (127) ln(127)
  − (1873) ln(1873) − (1389) ln(1389)]
   =  2[235.51 + 3492.39 + 292.15 + 9484.71
  + 15201.80 - 3919.62 - 615.21
   - 14113.61 - 10051.28] = 13.68

This value of G is referred to the probability calculator on your software package, or to tables of χ2 for 1 degree of freedom. It is significant at P = 0.0001.

You can therefore conclude that a significantly higher proportion of young single men are positive for HIV (0.0949) than of married men (0.0497) (P = 0.0001).