Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
Pearson's chi square test of independence
Worked example I
We will first take an example using a cross-sectional design as specified above. A random sample of 2000 men aged 18-25 is taken and each individual is classified as married/single and HIV positive/negative. You wish to determine whether the proportion of married men with HIV differs significantly from the proportion of single men with HIV. The first step is to calculate expected frequencies assuming that the frequencies in the cells reflect the marginal totals:
None of the expected frequencies is less than 5, so the continuity correction is not used.
This value of X2 is referred to the probability calculator on your software package, or to tables of χ2 for 1 degree of freedom. It is significant at P = 0.000132.
You can therefore conclude that a significantly higher proportion of young single men are positive for HIV (0.0949) than of married men (0.0497) (P = 0.0001).
Worked example II
Our second example is the same as one we used for the z-test. Individuals with falciparum malaria are randomly allocated to two treatment groups - one group receives drug A. the other drug B. The proportion of patients suffering neuropsychiatric side effects is compared between drug A and drug B.:
This has a P-value of 0.098, so we can conclude that the proportions are not significantly different at the conventionally accepted level of P = 0.05.
Worked example III
Our third example is from a study we have looked at previously - a multiple group study comparing behaviour of game mammals inside a protected area with that of animals outside a protected area. The proportion of animals fleeing on approach of a vehicle is compared between the two areas.
The smallest expected frequency is very low at only
This has a P-value of 0.1315 which is not even close to significance.
However, this result is unsafe as some of the expected frequencies are so low. If we use the Monte Carlo simulation method for Pearson's chi square statistic in R, we obtain markedly smaller P-values (usually around 0.07), close to the conventional value for significance. If we use Fisher's exact test (not shown here) we get a similar P-value (0.074) to the Monte Carlo simulation.
However, both the latter two methods assume that both rows and column totals are fixed. This is not the case and test results may be misleading. Hence we carry out an exact two sample independent binomial X2 test with Monte Carlo simulation using the R code given
This gives a mid-P-value of around 0.017, and a conventional P-value of 0.030 both of which are significant at the 0.05 level of significance. In this case using the exact test has shifted the result of the test from 'non significant' to 'significant' - but with other data sets it may shift in the other direction. The important thing is that one is using the correct test based on the design of the study - rather than an inadequate approximation. Note also: