The Gtest of independence is a likelihood ratio test which tests the goodness of fit of observed frequencies to their expected frequencies if row and column classifications were independent. The method is based on the multinomial distribution where both row and column totals are random, not fixed.
Two likelihoods are estimated  the likelihood of the observed frequencies under a multinomial distribution, and the likelihood if it is assumed that row and column classifications are independent. Twice the natural logarithm of this ratio is equal to G which is approximately distributed as χ^{2} with one degree of freedom.
This formula can also be used for goodness of fit tests and for contingency tables with more than two rows or columns.
As with Pearson's chi square test, there is an alternative computational formula for 2 x 2 contingency tables that is preferred as it is not subject to rounding errors (although it is rather more complicated!):
Algebraically speaking 
G 
= 
2 × ln 
[ 
a^{a}b^{b}c^{c}d^{d}N^{N}
 ] 

(a + b)^{a+b}(a + c)^{a+c}(b + d)^{b+d}(c + d)^{c+d} 
 = 
2 × [a ln(a) + b ln(b) + c ln(c) + d ln(d) 
  + N ln(N)− (a+b) ln(a+b) − (a+c) ln(a+c) 
  − (b+d) ln(b+d) − (c+d) ln(c+d)] 
Where:
 G is the likelihood ratio statistic, approximating to χ^{2} for large samples,
 a, b, c, & d are the observed frequencies for each cell,
 N is the total number of observations.

Continuity corrections
Williams Correction for a 2 × 2 table
William's correction is the preferred continuity correction for the G likelihood ratio test  although it cannot be used if there are any zeros in the table. It is not as conservative as Yates' correction.
Algebraically speaking 
q 
= 1 + 
{N/(a+b)+N/(c+d)1}{N/(a+c)+N/(b+d)1}


6N 
Corrected G statistic (G_{c}) = G/q 
Where:
 q is William's correction,
 all other symbols are as above

Yates' Correction for a 2 × 2 table
Yates correction can also be used with the G test, as with Pearson's chi squared, although again it tends to be too conservative. As well as correcting for continuity, it also resolves the problem if one of the observed frequencies is zero. If (adbc) is positive, subtract 0.5 from a and d and add 0.5 to b and c. If (adbc) is negative, add 0.5 to a and d and subtract 0.5 from b and c. For this table (adbc) is positive. The Yates corrected Gstatistic in then calculated in the normal way.
How to do it
Worked example
We will first take the same example of crosssectional design that we used for Pearson's chi square. A random sample of 2000 men aged 1825 is taken and each individual is classified as married/single and HIV positive/negative. You wish to determine if there is an association between marital status and infection status.
Marital status  HIV infection status  Totals 
Positive  Negative 
Single 
58 
553 
611 
Married  69 
1320 
1389 
Totals  127  1873  2000 

None of the expected frequencies is less than 5, so the continuity correction is not used.
Applying the appropriate formula to calculate the G statistic:
G 
= 
2×[58 ln(58) + 553 ln(553) + 69 ln(69) + 1320 ln(1320) 
  + 2000 ln(2000) − (611) ln(611) − (127) ln(127) 
  − (1873) ln(1873) − (1389) ln(1389)] 

= 
2×[235.51 + 3492.39 + 292.15 + 9484.71 
  + 15201.80  3919.62  615.21 
   14113.61  10051.28] = 13.68 

This value of G is referred to the probability calculator on your software package, or to tables of χ^{2} for 1 degree of freedom. It is significant at P = 0.0001.
You can therefore conclude that a significantly higher proportion of young single men are positive for HIV (0.0949) than of married men (0.0497) (P = 0.0001).

