InfluentialPoints.com
Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

 

 

Fisher's exact test

 

Worked example I

OriginNest siteTotals
HighLow
Captive-reared 2 810
Wild-caught9312
Totals1111N = 22

Let us assume you have 10 captive-reared and 12 wild caught birds which have to chose between 11 high and 11 low sites. The probability of the observed result is given by:
P =    (10)! (12)! (11)! (11)!
22! 2! 8! 9! 3!

Most of these cancel out leaving:
P    =    [1211109876543] [11109] [1110]
[222120191817161514131211][32]
     =    2.608163712 1012/1.858466812 x 1015
     =    0.014034

Only two more extreme tables are possible:

OriginNest siteTotals
HighLow
Captive-reared 1 910
Wild-caught10212
Totals1111N = 22
OriginNest siteTotals
HighLow
Captive-reared 0 1010
Wild-caught11112
Totals1111N = 22

P   =    (10)! (12)! (11)! (11)!
22! 1! 9! 10! 2!
    =    0.0009356
P   =    (10)! (12)! (11)! (11)!
22! 0! 10! 11! 1!
    =    0.0000170

Using

Summing probabilities we get the one-tailed P-value:

P = 0.014034 + 0.0009356 + 0.0000170 = 0.0149866

This probability is normally doubled to give the two tailed P-value

P = 0.02997

 

 

 

Hints and shortcuts

For any given set of margin totals the term [(a+b)! (c+d)! (a+c)! (b+d)!] / N! remains constant. This is a useful time-saver when combining a number of probabilities in a tail.

Unfortunately, for anything other than a very small sample, these numbers tend to become unmanageably large. For example, 40! ≅ 8.159 1047 or 81,591,528,280,000,000,000,000,000,000,000,000,000,000,000,000

There are two ways of coping with this problem -

  1. Even for very large samples, provided the smallest cell frequency is less than 40, you can calculate your results in a conventional manner.

      This is because dividing factorials cancel out like this -
      5!     =     [5 4 3 2 1]     =    5 4    =    20
      3! [3 2 1]

    For example, suppose we have these results (in red) -
          total
      3 997 1000
      5 1995 2000
    total ⇒ 8 2992 3000

    Then there are only 8 possible tables, 2 of which are more extreme than this one. From the formula above, the probability (P) of finding the observed cell frequencies is

    P =    1000! 2000! 8! 2992!
    3000! 3! 997! 5! 1995!

    Most of which cancels out, leaving us
    P =    1000999998 20001999199819971996 876
    30002999299829972996299529942993 321

    Which is rather more straightforward, if rather tedious to work out.

     

  2. Where the smallest cell frequency is greater than 40, the only way to handle these factorials is as logarithms. Because the terms within these equations are all multiplied and divided, you simply add and subtract their logarithms. So,
    log{P} =    [log{(a+b)!} + log{(c+d)!} + log{(a+c)!} + log{(b+d)!}]
      − [log{N!} + log{a!} + log{b!} + log{c!} + log{d!}]   

    But, if these factorials are so colossal, how do you find their logarithms directly ?

     

    There are two methods of finding logs of factorials, that avoid working out the factorials themselves.

    1. If you are writing a computer programme you can use this formula -

      log(N!) = log(N) + log(N-1) + log(N-2) + log(N-3)... + log(2)

        Where N is the number whose factorial you wish to calculate.

      For example, log(3!) = log(3) + log(2) = 0.4771 + 0.3010 = 0.7781

       

    2. If you are using a calculator this is clearly impractical.

      A less accurate, but much quicker formula is -

      ln(N!) ≅ N ln(N + 0.5) - (N + 0.92)


        Where ln(N!) is the natural log (loge) of N!

      If N > 20 the error is less than 0.01%, and if N > 100 it is < 0.002%