| |
Fisher's exact test
Worked example I
Let us assume you have 10 captive-reared and 12 wild caught birds which have to chose between 11 high and 11 low sites.
Origin | Nest site | Totals |
High | Low |
Captive-reared | 2 | 8 | 10 |
Wild-caught | 9 | 3 | 12 |
Totals | 11 | 11 | N = 22 |
| |
The probability of the observed result is given by:
P = |
(10)! (12)! (11)! (11)! |
22! 2! 8! 9! 3! |
| |
Most of these cancel out leaving:
P |
= |
[12×11×10×9×8×7×6×5×4×3] [11×10×9] [11×10] |
[22×21×20×19×18×17×16×15×14×13×12×11][3×2] |
|
= |
2.608163712 × 1012/1.858466812 x 1015 |
|
= |
0.014034 |
| |
Only two more extreme tables are possible.
Firstly:
Origin | Nest site | Totals |
High | Low |
Captive-reared | 1 | 9 | 10 |
Wild-caught | 10 | 2 | 12 |
Totals | 11 | 11 | N = 22 |
| |
The probability of observing this by chance is:
P |
= |
(10)! (12)! (11)! (11)! |
22! 1! 9! 10! 2! |
|
= |
0.0009356 |
| |
Secondly:
Origin | Nest site | Totals |
High | Low |
Captive-reared | 0 | 10 | 10 |
Wild-caught | 11 | 1 | 12 |
Totals | 11 | 11 | N = 22 |
| |
The probability of observing this by chance is:
P |
= |
(10)! (12)! (11)! (11)! |
22! 0! 10! 11! 1! |
|
= |
0.0000170 |
| |
 Using
Summing probabilities we get the one-tailed P-value:
P = 0.014034 + 0.0009356 + 0.0000170 = 0.0149866
This probability is normally doubled to give the two tailed P-value P = 0.02997

Hints and shortcuts
For any given set of margin totals the term [(a+b)! (c+d)! (a+c)! (b+d)!] / N! remains constant. This is a useful time-saver when combining a number of probabilities in a tail.
Unfortunately, for anything other than a very small sample, these numbers tend to become unmanageably large. For example, 40! ≅ 8.159 × 1047 or 81,591,528,280,000,000,000,000,000,000,000,000,000,000,000,000 | |
There are two ways of coping with this problem -
Even for very large samples, provided the smallest cell frequency is less than 40, you can calculate your results in a conventional manner.
For example, suppose we have these results (in red) -
|
|
|
total ⇓ |
|
3 |
997 |
1000 |
|
5 |
1995 |
2000 |
total ⇒ |
8 |
2992 |
3000 |
| |
Then there are only 8 possible tables, 2 of which are more extreme than this one. From the formula above, the probability (P) of finding the observed cell frequencies is
P = |
1000! 2000! 8! 2992! |
3000! 3! 997! 5! 1995! |
| |
Most of which cancels out, leaving us
P = |
1000×999×998 × 2000×1999×1998×1997×1996 × 8×7×6 |
3000×2999×2998×2997×2996×2995×2994×2993 × 3×2×1 |
| |
Which is rather more straightforward, if rather tedious to work out.
Where the smallest cell frequency is greater than 40, the only way to handle these factorials is as logarithms. Because the terms within these equations are all multiplied and divided, you simply add and subtract their logarithms.
So,
log{P} = |
[log{(a+b)!} + log{(c+d)!} + log{(a+c)!} + log{(b+d)!}] |
|
− [log{N!} + log{a!} + log{b!} + log{c!} + log{d!}] |
| |
But, if these factorials are so colossal, how do you find their logarithms directly ?
There are two methods of finding logs of factorials, that avoid working out the factorials themselves.
If you are writing a computer programme you can use this formula -
log(N!) = log(N) + log(N-1) + log(N-2) + log(N-3)... + log(2)
For example, log(3!) = log(3) + log(2) = 0.4771 + 0.3010 = 0.7781
If you are using a calculator this is clearly impractical.
A less accurate, but much quicker formula is -
ln(N!) ≅ N × ln(N + 0.5) - (N + 0.92)
Where ln(N!) is the natural log (loge) of N! 
If N > 20 the error is less than 0.01%, and if N > 100 it is < 0.002%
|