Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)


  • Expected frequencies of binomial distribution

    • By expansion of (P + Q)n

    Expected frequencies for the binomial can be obtained by expanding the expression (P + Q)n. This is straightforward, but rather tedious for large values of n. Each term of the expansion describes the frequency of a class, each of which corresponds to the probability of finding n, n − 1, n − 2 ... 0 of the observations positive. For example, for 3 observations we expand (P + Q)3 to give P3 + 3P2Q+ 3PQ2 + Q3, where the terms predict the probability of finding three, two, one or zero positive observations. For any sample size (n), there are n + 1 possible classes, corresponding to between zero and n of the observations being positive.

    For larger values of n you can work out this series of terms in two stages.

    1. The powers of P & Q -
      For any sample size (n) there will be (n+1) terms, each of which corresponds to r successes. Each term in the series is calculated as Pn − rQr, like this:
      Pn − 0Q0 + Pn − 1Q1 + Pn − 2Q2 + Pn − 3Q3 . . . Pn − nQn = 1
    2. Their coefficients -
      Each of these terms has to be multiplied by a coefficient to allow for the differing number of ways of achieving it. These coefficients can be derived pragmatically using 'Pascal's triangle', part of which is given here.

    Worked example

    Let us assume we have taken 200 samples, each of 4 mosquitoes, from a population with a Plasmodium prevalence of 30% (P = 0.3). The observed frequencies of samples with 4,3,2,1 and 0 infected mosquitoes are given below. We then work out expected frequencies assuming a binomial distribution:
    No. infected
    Term of

    Given that the observed and expected frequencies are fairly similar we would probably accept that the data conforms to a binomial distribution and that the samples were independently taken. We will consider how to assess statistically whether the binomial distribution provides an acceptable fit to the data in Unit 10.


    • By the general formula

    Alternatively, we could use the general formula to calculate probabilities:

    Worked example

    We will take the same example again, namely 200 samples, each of 4 mosquitoes, from a population with a Plasmodium prevalence of 30% (P = 0.3). Note that you need to know a little about 'factorial' arithmetic for this.

    No. infected
    [4! 0.34 0.70]
    [4! 0!]
    [4! 0.33 0.71]
    [3! 1!]
    [4! 0.32 0.72]
    [2! 2!]
    [4! 0.31 0.73]
    [1! 3!]
    [4! 0.30 0.74]
    [0! 4!]


    {Fig. 6}

    Calculating probabilities is relatively straightforward for small samples, but for more than 20 or 30 observations the arithmetic becomes awkward. For example, 10! = 3,628,800 = 3.2 106, but 50! ≅ 3.041  1064. Handling numbers this large presents serious difficulties, even for a computer - and they are often calculated as log factorials, and handled as log probabilities.

    If the sample is large (n > 25), and P ≅ 0.5 the binomial distribution is approximately normal in shape. Proportions closer to 0 or 1 yield skewed distributions, which can be normalized using an appropriate transformation. Distributions of extreme proportions (npq < 10) can be estimated using the Poisson distribution.



  • Expected frequencies of Poisson distribution

    Here we use the general formula for the Poisson distribution to calculate probabilities:

    Worked example

    We will take an example given by Glynn & Buring (1996) on hospital admissions among white Medicare patients aged 65-99 in 1989 in Jackman, Maine, USA.

    First we estimate the mean number of admissions as:
    Mean = [0133 + 120 + 25 + 32 + 41 + 51] / 162 = 0.278

    Then we substitute this value in the formula for the Poisson distribution to estimate the expected frequency for each value of r:

    No. admissions
    per individual
    0133e − 0.278  0.7573122.7
    1200.278 e − 0.278 0.210534.1
    250.2782/2 e − 0.278 0.02934.7
    320.2783/6 e − 0.278 .00270.4
    410.2784/24 e − 0.278 0.00020
    510.2785/120 e − 0.278 0.00000

    This is one case where we would not the surprised if the data did not follow a Poisson distribution since events are unlikely to be independent. A person entering hospital once one might be expected to be more likely to have to return to hospital than other individuals (for example those suffering from chronic diseases). And indeed the distribution does appear to differ from a Poisson, with an excess of zero and multiple admissions, and a deficit of single admissions.

    {Fig. 7}