 InfluentialPoints.com
Biology, images, analysis, design...
 Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

# Pearson's chi square test of independence  #### Worked example I

We will first take an example using a cross-sectional design as specified above. A random sample of 2000 men aged 18-25 is taken and each individual is classified as married/single and HIV positive/negative. You wish to determine whether the proportion of married men with HIV differs significantly from the proportion of single men with HIV. The first step is to calculate expected frequencies assuming that the frequencies in the cells reflect the marginal totals:

 Marital status HIV infection status Totals Proportion infected Positive Negative Observed Expected Observed Expected Single 58 38.7985 553 572.2015 611 0.0949 Married 69 88.2015 1320 1300.7985 1389 0.0497 Totals 127 1873 2000

None of the expected frequencies is less than 5, so the continuity correction is not used.

Applying the general formula to calculate the Pearson's X2 statistic: Using
 X2 = (69 − 88.202)2 + (58 − 38.799)2 +  88.202 38.799 (1320 − 1300.799)2 + (553 − 572.202)2  1300.799 572.202 = 4.180 + 9.502 + 0.283 + 0.644 = 14.61

This value of X2 is referred to the probability calculator on your software package, or to tables of χ2 for 1 degree of freedom. It is significant at P = 0.000132.

You can therefore conclude that a significantly higher proportion of young single men are positive for HIV (0.0949) than of married men (0.0497) (P = 0.0001).

#### Worked example II

Our second example is the same as one we used for the z-test. Individuals with falciparum malaria are randomly allocated to two treatment groups - one group receives drug A. the other drug B. The proportion of patients suffering neuropsychiatric side effects is compared between drug A and drug B.:

 Antimalarialdrug Neuropsychiatric reactions Totals Propn affected Present Absent A 3 (a) 22 (b) 25 0.12 B 9 (c) 16 (d) 25 0.36 Totals 12 38 50

As the smallest expected frequency is only 6.5 we will use the continuity correction to obtain a conventional P-value (although a mid-P-value would be perfectly acceptable).

For this example we will use the simpler computational formula:  Using
 X2c   = 50 × (|3×16 − 22×9| − 25)2 = 2.7412 25 × 12 × 38 × 25

This has a P-value of 0.098, so we can conclude that the proportions are not significantly different at the conventionally accepted level of P = 0.05.

Note that if we take the square root of the chi square statistic in this test (2.7412) we get the z-value we obtained when we used the z-test for independent proportions on the same data (1.6556). #### Worked example III

Our third example is from a study we have looked at previously - a multiple group study comparing behaviour of game mammals inside a protected area with that of animals outside a protected area. The proportion of animals fleeing on approach of a vehicle is compared between the two areas.

 Location Behaviour Totals Propn affected Flee Not flee Inside park 6 (a) 2 (b) 8 23.1 Outside park 20 (c) 0 (d) 20 0 Totals 26 2 28

The smallest expected frequency is very low at only 0.57. We will use the continuity correction with the simpler computational formula but anticipate an inaccurate test because the assumptions of chi square will not be met. Using
 X2   = 28 × (|6×0 − 20×2| − 14)2 = 2.275 8 × 26 × 2 × 20

This has a P-value of 0.1315 which is not even close to significance.

However, this result is unsafe as some of the expected frequencies are so low. If we use the Monte Carlo simulation method for Pearson's chi square statistic in R, we obtain markedly smaller P-values (usually around 0.07), close to the conventional value for significance. If we use Fisher's exact test (not shown here) we get a similar P-value (0.074) to the Monte Carlo simulation.

However, both the latter two methods assume that both rows and column totals are fixed. This is not the case and test results may be misleading. Hence we carry out an exact two sample independent binomial X2 test with Monte Carlo simulation using the R code given above.  Using

This gives a mid-P-value of around 0.017, and a conventional P-value of 0.030 both of which are significant at the 0.05 level of significance. In this case using the exact test has shifted the result of the test from 'non significant' to 'significant' - but with other data sets it may shift in the other direction. The important thing is that one is using the correct test based on the design of the study - rather than an inadequate approximation. Note also:

• The conventional P-value is still rather close to 0.05 - the only sensible conclusion is that larger samples are required.
• It is assumed that observations are independent (in the study reported this was unclear)
• We can only make inferences about the two areas - not about the treatment factor (that is 'protected' versus 'not protected') since we only have one replicate (= area) of each level.
 Except where otherwise specified, all text and images on this page are copyright InfluentialPoints, all rights reserved. Images not copyright InfluentialPoints credit their source on web-pages attached via hypertext links from those images.