Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

Search this site



Pearson's chi square test of independence

Worked example I

We will first take an example using a cross-sectional design as specified above. A random sample of 2000 men aged 18-25 is taken and each individual is classified as married/single and HIV positive/negative. You wish to determine whether the proportion of married men with HIV differs significantly from the proportion of single men with HIV. The first step is to calculate expected frequencies assuming that the frequencies in the cells reflect the marginal totals:

Marital statusHIV infection statusTotalsProportion
Single 58 38.7985553 572.2015611 0.0949
Married69 88.20151320 1300.79851389 0.0497
Totals127 1873 2000 

None of the expected frequencies is less than 5, so the continuity correction is not used.

Applying the general formula to calculate the Pearson's X2 statistic:

X2 =   (69 − 88.202)2 +  (58 − 38.799)2 +
88.202 38.799
     (1320 − 1300.799)2 + (553 − 572.202)2  
1300.799 572.202
  =   4.180 + 9.502 + 0.283 + 0.644 
  =    14.61  

This value of X2 is referred to the probability calculator on your software package, or to tables of χ2 for 1 degree of freedom. It is significant at P = 0.000132.

You can therefore conclude that a significantly higher proportion of young single men are positive for HIV (0.0949) than of married men (0.0497) (P = 0.0001).


Worked example II

Our second example is the same as one we used for the z-test. Individuals with falciparum malaria are randomly allocated to two treatment groups - one group receives drug A. the other drug B. The proportion of patients suffering neuropsychiatric side effects is compared between drug A and drug B.:

TotalsPropn affected
A3 (a)22 (b) 250.12
B9 (c)16 (d) 250.36
Totals1238 50 

As the smallest expected frequency is only 6.5 we will use the continuity correction to obtain a conventional P-value (although a mid-P-value would be perfectly acceptable).

For this example we will use the simpler computational formula:

X2c   =  50 (|3×16 − 22×9| − 25)2    =   2.7412 
25 12 38 25

This has a P-value of 0.098, so we can conclude that the proportions are not significantly different at the conventionally accepted level of P = 0.05.

Note that if we take the square root of the chi square statistic in this test (2.7412) we get the z-value we obtained when we used the z-test for independent proportions on the same data (1.6556).


Worked example III

Our third example is from a study we have looked at previously - a multiple group study comparing behaviour of game mammals inside a protected area with that of animals outside a protected area. The proportion of animals fleeing on approach of a vehicle is compared between the two areas.

LocationBehaviour TotalsPropn affected
FleeNot flee
Inside park6 (a)2 (b) 823.1
Outside park20 (c)0 (d) 200
Totals262 28 

The smallest expected frequency is very low at only 0.57. We will use the continuity correction with the simpler computational formula but anticipate an inaccurate test because the assumptions of chi square will not be met.

X2   =  28 (|6×0 − 20×2| − 14)2    =   2.275 
8 26 2 20

This has a P-value of 0.1315 which is not even close to significance.

However, this result is unsafe as some of the expected frequencies are so low. If we use the Monte Carlo simulation method for Pearson's chi square statistic in R, we obtain markedly smaller P-values (usually around 0.07), close to the conventional value for significance. If we use Fisher's exact test (not shown here) we get a similar P-value (0.074) to the Monte Carlo simulation.

However, both the latter two methods assume that both rows and column totals are fixed. This is not the case and test results may be misleading. Hence we carry out an exact two sample independent binomial X2 test with Monte Carlo simulation using the R code given above.


This gives a mid-P-value of around 0.017, and a conventional P-value of 0.030 both of which are significant at the 0.05 level of significance. In this case using the exact test has shifted the result of the test from 'non significant' to 'significant' - but with other data sets it may shift in the other direction. The important thing is that one is using the correct test based on the design of the study - rather than an inadequate approximation. Note also:

  • The conventional P-value is still rather close to 0.05 - the only sensible conclusion is that larger samples are required.
  • It is assumed that observations are independent (in the study reported this was unclear)
  • We can only make inferences about the two areas - not about the treatment factor (that is 'protected' versus 'not protected') since we only have one replicate (= area) of each level.