Models & study designs
This test is used to assess whether paired observations on two (usually nominal) variables are independent of each other. It thus enables us to determine if there is a significant difference between two independent proportions. The frequencies in each category are arranged in a contingency table. The test statistic is Pearson's chi square statistic (X^{2}) as defined below. It's precise distribution depends on the sampling model.
Multinomial model
The original Pearson's chi square statistic assumes a multinomial model with only the total number of observations fixed. This can arise from two possible sampling designs:
Characteristic B  Characteristic A  Totals 
Present  Absent 
Present  a  b  a+b 
Absent  c  d  c+d 
Totals  a+c  b + d  n = a+b+c+d 
A single random sample is taken ( analytical survey) and individuals are classified according to two characteristics. For example we might take a random sample of 2000 adult men aged 1825 and determine whether each is married or single, and whether each is positive or negative for the HIV virus. We then compare the proportion of married men with the virus with the proportion of single men with the virus.

 Characteristic A  Totals 
Present  Absent 
Treatment 1  a  b  a+b 
Treatment 2  c  d  c+d 
Totals  a+c  b + d  n = a+b+c+d 
Individuals are randomly allocated to two treatment groups (completely randomized experimental design) and in each group the frequencies with and without a particular characteristic are recorded. For example, individuals with malaria are randomly allocated to two treatment groups in which patients are given either drug A or drug B. The proportion of patients suffering neuropsychiatric side effects is compared between drug A and drug B.
Note that in practice most experiments use some form of restricted randomization so that numbers in each treatment group are (more or less) fixed (see below).

Independent binomial model
In the second model either row or column totals are fixed (giving a double binomial model), but the other marginal totals are free to vary.
 Characteristic A  Totals 
Present  Absent 
Sample 1  a  b  a+b 
Sample 2  c  d  c+d 
Totals  a+c  b + d  n = a+b+c+d 
Two random samples are taken ( comparative area observational design) and in each sample the frequencies with and without a particular characteristic are recorded. For example, we take two random samples, one from a rural area and the other from an urban area, each of 1000 adult men. We then compare the proportion of infected men from rural areas with the proportion of infected men from urban areas. The same model applies for cohort or casecontrol designs, and randomized trials where restricted randomization is used to equalize group sizes for each treatment.

The exact distributions of X^{2} obtained under the two different models differ somewhat. However the asymptotic distribution of the statistic for both models is chi square with (r − 1)(c − 1) degrees of freedom  hence this is used for the large sample test.
Important point
In Unit 8 we analysed proportions in the situation where we had taken replicated samples from a population, and calculated percentages from each sample. That is the correct approach for handling replicated proportions (no transformation was used since p was close to .5).
We are dealing here with a quite different situation  namely where proportions are calculated either from a single random sample or from two independent samples. Under these circumstances, variability cannot be measured, but can only be estimated using the binomial distribution. Pearson's chi square test should not be used for analysing replicated proportions.

Large sample tests
General formula
The test statistic  X^{2} known as Pearson's chi square  can be calculated from the following general formula:
Algebraically speaking 
X^{2} = 
Σ 
(f_{i} − _{i})^{2} 

_{i} 
Where:
 X^{2} is Pearson's chi square statistic,
 f_{i} is observed frequency in cells a to d,
 _{i} is its expected frequency in cells a to d calculated from the marginal totals.
For example, _{a} = ((a + b)/n)(a + c)

This formula can also be used for goodness of fit tests and for contingency tables with more than two rows or columns.
For 2 × 2 contingency tables there is an alternative computational formula that is preferred as it is less subject to rounding errors:
Algebraically speaking 
X^{2} = 
n (ad − bc)^{2} 

(a+b)(a+c)(b+d)(c+d) 
Where:
 X^{2} is Pearson's chi square statistic,
 a, b, c, and d are the frequencies in each cell of the table as shown above,
 n is the total number of observations.

Note that for the 2 × 2 table the formulations given above are mathematically identical to the square of the statistic obtained in the ztest for independent proportions.
The value of X^{2} is referred to the probability calculator on your software package, or to a table of χ^{2} values at (r − 1)(c − 1) degrees of freedom, where r is the number of rows in the contingency table, and c is the number of columns. Thus for a two by two contingency table there is always just one degree of freedom.
When the chi square test is used as a test of association it is naturally two sided since the null hypothesis is of no association versus the alternative of some association. However, when it is being used to compare two proportions (in other words for a 2 × 2 table), a onesided test might be required. This is obtained by simply halving the Pvalue given by Pearson's chi square statistic.
Correction for continuity
For small sample sizes, many  but not all  statisticians feel that a correction for continuity should be applied. This is because a continuous distribution (chi square) is being used to represent the discrete distribution of sample frequencies. The Yates correction to the general formula is achieved by subtracting 0.5 from the modulus of each difference between observed and expected values:
Algebraically speaking 
X^{2}_{c} = 
Σ 
(f_{i} − _{i} − 0.5)^{2} 

_{i} 
Where:
 X^{2}_{c} is the corrected Pearson's chi square statistic,
 f_{i} is observed frequency in cells a to d,
 _{i} is its expected frequency in cells a to d calculated as above.

The Yates correction for the computational formula is shown below:
Algebraically speaking 
X^{2}_{c} = 
n (ad − bc − n/2)^{2} 

(a+b)(a+c)(b+d)(c+d) 
Where:
 X^{2}_{c} is the corrected Pearson's chi square statistic,
 a, b, c, and d are the frequencies in each cell of the table as shown above,
 n is the total number of observations.

The correction is usually only recommended if the smallest expected frequency is less than 5. Note that the correction should not be applied if ad − bc is less than n/2.
The 'n − 1' chisquare test
In 1947 Pearson recommended a third version of the chisquare test where n in the computational formula for the 2 × 2 table is replaced by n − 1.
Algebraically speaking 
X^{2} = 
(n − 1) (ad − bc)^{2} 

(a+b)(a+c)(b+d)(c+d) 
Where:
 X^{2} is Pearson's chi square statistic,
 a, b, c, and d are the frequencies in each cell of the table as shown above,
 n is the total number of observations.

We note that one statistical package (EpiInfo) describes this as the MantelHaenszel chisquare test although this usage of the term is not recommended.
Exact tests using the X^{2} statistic
Multinomial
For the model where neither row nor column totals are fixed, the exact distribution is obtained from the multinomial distribution. However it is computationally demanding because of the many possible tables. For example even with n = 4 there are 35 different contingency tables. Nonetheless it is possible to carry out an exact test for very small tables (see Conover (1999) pp. 206  209) and a few statistical packages provide this.
Independent binomial
For a model with either row or column totals fixed, the position is somewhat easier since there are fewer possible tables. For example with 2 observations in each population (again n = 4) there are only 9 different contingency tables. The exact distribution of X^{2} is given by multiplying together the probabilities for each population obtained from the binomial distribution (see Conover (1999) pp. 185  187).
Monte Carlo solutions
An alternative approach is to use a Monte Carlo approach to simulate the distribution of the statistic for each of the two models. Base R's chisq.test function only provides a Monte Carlo simulation test for X^{2} using the third model where both row and column totals are fixed. This is not very useful as this model is rarely used and is anyway covered by Fisher's exact test. Hence we have provided a set of 4 functions below which enable R to give you Pvalues of X^{2} tests, where the distribution of the X^{2} statistic is estimated from random samples from either one or two null populations.
Doing an exact X^{2} test with R


We used this approach to compare our result with that given by Ludbrook (2008) for the independent binomial model (termed by Ludbrook the comparative trial or singly conditioned 2×2 table). Group 1 had 14 dead and 9 alive, so p_{1} = 0.6087. Group 2 had 17 dead and 2 alive so p_{2} = 0.8947. Ludbrook considered that the Pvalue of 0.044 obtained by the package Testimate for the singleconditioned option for exact tests on an odds ratio of 1, a risk ratio of 1 and a risk difference of 0 was acceptable. The Pvalue of 0.0391 obtained by StatXact for exact tests on a risk ratio of 1 and a risk difference of 0 was also deemed acceptable. Our exact Monte Carlo X^{2} test for these data with one million replicates gave a similar Pvalue of 0.0381.
Assumptions
Sampling or allocation is random
For model 1 (multinomial).
A simple random sample is taken and each observation is classified into one of two different categories for each of two characteristics or
Individuals in a group are (completely) randomly allocated to receive either treatment A_{1} or treatment A_{2}, and are then classified into one of two different categories for characteristic B.
For model 2 (double binomial)
Two independent samples are taken randomly and each observation is classified into one of two different categories for characteristic A or
Individuals in a group are allocated to receive either treatment A_{1} or treatment A_{2} using restricted randomization to equalize group sizes, and are then classified into one of two different categories for characteristic B
Observations are independent
Observations are assumed to be independent of each other. This assumption is not met if (for example) samples are obtained from clusters, or cluster randomization is used, and the test is then used to analyze results at an individual level. However, there is an approximate correction which can be applied to the chi square test for used with cluster samples which we cover in Unit 10. Nor is the test appropriate for analysing contingency tables derived from pooled samples. The analysis of sets of 2 × 2 contingency tables is dealt with in a related topic above.
Errors are normally distributed
Both models assume errors are normally distributed. Providing the cell frequencies are reasonably large, cell values in a 2 × 2 table will be distributed normally about their expected values. If any expected frequency is less than 5, then providing you want a conventional Pvalue, the continuity correction should be applied. Omission of the continuity correction will give you a midPvalue. For very small sample sizes the conventional wisdom has been to use Fisher's exact test, although use of an exact test based on the correct model is now preferred.
Mutual exclusivity
A given case may fall only in one class.
Related
topics : 
G likelihood ratio test
r × c tables & partitioning
Chi square test for trend

Comparing survival rates
Multiple 2×2 tables
Measuring agreement
