 InfluentialPoints.com
Biology, images, analysis, design...
 Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

# Fisher's exact test  #### Worked example I

Let us assume you have 10 captive-reared and 12 wild caught birds which have to chose between 11 high and 11 low sites.

 Origin Nest site Totals High Low Captive-reared 2 8 10 Wild-caught 9 3 12 Totals 11 11 N = 22

The probability of the observed result is given by:

 P = (10)! (12)! (11)! (11)! 22! 2! 8! 9! 3!

Most of these cancel out leaving:

 P = [12×11×10×9×8×7×6×5×4×3] [11×10×9] [11×10] [22×21×20×19×18×17×16×15×14×13×12×11][3×2] = 2.608163712 × 1012/1.858466812 x 1015 = 0.014034

Only two more extreme tables are possible.

Firstly:

 Origin Nest site Totals High Low Captive-reared 1 9 10 Wild-caught 10 2 12 Totals 11 11 N = 22

The probability of observing this by chance is:

 P = (10)! (12)! (11)! (11)! 22! 1! 9! 10! 2! = 0.0009356

Secondly:

 Origin Nest site Totals High Low Captive-reared 0 10 10 Wild-caught 11 1 12 Totals 11 11 N = 22

The probability of observing this by chance is:

 P = (10)! (12)! (11)! (11)! 22! 0! 10! 11! 1! = 0.0000170 Using

Summing probabilities we get the one-tailed P-value:

P = 0.014034 + 0.0009356 + 0.0000170 = 0.0149866

This probability is normally doubled to give the two tailed P-value

P = 0.02997 #### Hints and shortcuts

For any given set of margin totals the term [(a+b)! (c+d)! (a+c)! (b+d)!] / N! remains constant. This is a useful time-saver when combining a number of probabilities in a tail.

Unfortunately, for anything other than a very small sample, these numbers tend to become unmanageably large. For example, 40! ≅ 8.159 × 1047 or

 81,591,528,280,000,000,000,000,000,000,000,000,000,000,000,000

There are two ways of coping with this problem -

1. Even for very large samples, provided the smallest cell frequency is less than 40, you can calculate your results in a conventional manner.

This is because dividing factorials cancel out like this -

 5! = [5 × 4 × 3 × 2 × 1] =    5 × 4    =    20 3! [3 × 2 × 1]

For example, suppose we have these results (in red) -

 total⇓ 3 997 1000 5 1995 2000 total ⇒ 8 2992 3000

Then there are only 8 possible tables, 2 of which are more extreme than this one. From the formula above, the probability (P) of finding the observed cell frequencies is

 P = 1000! 2000! 8! 2992! 3000! 3! 997! 5! 1995!

Most of which cancels out, leaving us

 P = 1000×999×998 × 2000×1999×1998×1997×1996 × 8×7×6 3000×2999×2998×2997×2996×2995×2994×2993 × 3×2×1

Which is rather more straightforward, if rather tedious to work out.

2. Where the smallest cell frequency is greater than 40, the only way to handle these factorials is as logarithms. Because the terms within these equations are all multiplied and divided, you simply add and subtract their logarithms. So,

 log{P} = [log{(a+b)!} + log{(c+d)!} + log{(a+c)!} + log{(b+d)!}] − [log{N!} + log{a!} + log{b!} + log{c!} + log{d!}]

But, if these factorials are so colossal, how do you find their logarithms directly ?

There are two methods of finding logs of factorials, that avoid working out the factorials themselves.

1. If you are writing a computer programme you can use this formula -

log(N!) = log(N) + log(N-1) + log(N-2) + log(N-3)... + log(2) Where N is the number whose factorial you wish to calculate.

For example, log(3!) = log(3) + log(2) = 0.4771 + 0.3010 = 0.7781

2. If you are using a calculator this is clearly impractical.

A less accurate, but much quicker formula is -

ln(N!) ≅ N × ln(N + 0.5) - (N + 0.92)

If N > 20 the error is less than 0.01%, and if N > 100 it is < 0.002%