Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
Fisher's exact test: Use & misuse
(2x2 contingency table, fixed factors, test of association)
Statistics courses, especially for biologists, assume formulae = understanding and teach how to do statistics, but largely ignore what those procedures assume, and how their results mislead when those assumptions are unreasonable. The resulting misuse is, shall we say, predictable...
Use and MisuseAs with Pearson's chi square test, the purpose of Fisher's exact test is to determine if there is a significant difference between two proportions or to test association between two characteristics. However, Fisher's exact test assumes a quite different model. As before the frequencies in each category are arranged in a 2x2 contingency table. But in this case both row and column totals are assumed to be fixed - not random. Assuming the marginal totals are fixed greatly simplifies the mathematics and means that probabilities can be estimated using the hypergeometric distribution with four classes.
Fisher's exact test is used widely in all disciplines but nearly always in the small sample situation, rather than when the design is appropriate. This should arguably be classified as a misuse of the test. It is true that Fisher's exact test give a better approximation to the correct probability under such circumstances than Pearson's chi square test - but it is nearly always too conservative and may be misleading. The conservative nature of Fisher's exact is an especial problem when carrying out initial univariate analysis to identify variables for subsequent inclusion in a multivariate model. The preferred approach in the small sample situation is to use an exact (Monte Carlo) test using the correct model - for analytical surveys the multinomial model, and for randomized trials the independent binomial model.
As with Pearson's chi square test, lack of independence of outcome is the commonest factor invalidating the test. This applies whenever repeated observations are being made on the same animals - all are examples of pseudoreplication. The same issue arises if the experimental unit is changed so that the frequencies in the contingency table no longer applies to the number of units randomly allocated to treatment. In general you cannot just transfer the statistical approaches used for randomized trials using patients to one using mosquitoes unless the individual mosquitoes are randomized to treatment! Pooling of frequencies from different blocks or replicates in experimental designs is also invalid. There is a clear risk of bias when categories are collapsed to create 2 × 2 tables from r × c tables - this may have occurred in a paper on financial sponsorship of nutrition-related scientific articles.
The fact that Fisher's exact test is generally used when sample sizes are small leads irrevocably to another problem with its use - namely that, irrespective of the result of a statistical test, one cannot have much confidence in results based on very small sample sizes. Any test on such data will have low power to disprove the null hypothesis, and the chance of such a sample being representative of a population is low. As always, convenience sampling means that inference cannot be extended outside the sample. Sometimes, however, a permutation test restricted to the sample is perfectly adequate such as in the test of the hypothesis that species invading Florida are more likely to be herbivorous or omnivorous than the native species.
What the statisticians sayArmitage & Berry (2002) cover exact tests for contingency tables in Chapter 15. Woodward (2004) gives a rather superficial treatment of Fisher's exact test in Chapter 3. Agresti (2002) provides information on all aspects of categorical data analysis. Fleiss et al. (2003) give only a brief coverage of exact methods in Chapter 2. Conover (1999) looks at Fisher's exact test for 2 2 tables in Chapter 4, as well as exact methods for larger r × c tables. Sokal & Rohlf (1995) give a good coverage of the different models for 2 × 2 tables, together with some ecological examples of Model III designs suitable for analysis by Fisher's exact test.
Ludbrook (2008) cautions against the use of Fisher's exact test for anything other than for the fixed rows and columns model. Campbell (2007), however, still advocates the use of Fisher's exact test when any expected value is less than 1. Agresti (2001) examines continuing controversies in the use of exact inference for categorical data. Agresti (1992) reviews previous work on exact inference for contingency tables. Upton (1992) reverses his earlier opposition to the use of Fisher's exact test, but then advocates the use of mid-P-values. Fay (1992) gives confidence intervals that match Fisher's exact or Blaker's exact tests. McKinney et al. (1989) comments on the failure of many authors to specify whether they are using the one-tailed or two-tailed version of Fisher's exact test.
Barnard (1945) also produced an exact test for 2 × 2 tables which was bitterly criticized by Fisher. Fisher (1935) first suggested his 'exact test' in the same year as Irwin (1935) - hence the suggested naming of the test as the Fisher-Irwin test.
Wikipedia has sections on Fisher's exact test, Barnard's test Ian Campbell reviews of what you need to know (and probably more than you want to know) about analyzing a 2 × 2 table, including details of the various ways to calculate the two-sided P-value for Fisher's exact test. The R-statistics blog by Tal Galili gives the code for Barnard's test in R. Cyrus R. Mehta & Pralay Senchaudhuri compare Fisher's & Barnard's tests for the 2 × 2 table.