 InfluentialPoints.com
Biology, images, analysis, design...
 Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)  • ### Expected frequencies of binomial distribution

• By expansion of (P + Q)n

Expected frequencies for the binomial can be obtained by expanding the expression (P + Q)n. This is straightforward, but rather tedious for large values of n. Each term of the expansion describes the frequency of a class, each of which corresponds to the probability of finding n, n − 1, n − 2 ... 0 of the observations positive. For example, for 3 observations we expand (P + Q)3 to give P3 + 3P2Q+ 3PQ2 + Q3, where the terms predict the probability of finding three, two, one or zero positive observations. For any sample size (n), there are n + 1 possible classes, corresponding to between zero and n of the observations being positive.

For larger values of n you can work out this series of terms in two stages.

1. The powers of P & Q -
For any sample size (n) there will be (n+1) terms, each of which corresponds to r successes. Each term in the series is calculated as Pn − rQr, like this:
Pn − 0Q0 + Pn − 1Q1 + Pn − 2Q2 + Pn − 3Q3 . . . Pn − nQn = 1
2. Their coefficients -
Each of these terms has to be multiplied by a coefficient to allow for the differing number of ways of achieving it. These coefficients can be derived pragmatically using 'Pascal's triangle', part of which is given here. #### Worked example

Let us assume we have taken 200 samples, each of 4 mosquitoes, from a population with a Plasmodium prevalence of 30% (P = 0.3). The observed frequencies of samples with 4,3,2,1 and 0 infected mosquitoes are given below. We then work out expected frequencies assuming a binomial distribution:

 No. infected mosquitoes Observedfrequency Term of expansion Probability Expectedfrequency 4 3 P4 0.0081 1.6 3 19 4P3Q 0.0756 15.1 2 48 6P2Q2 0.2646 52.9 1 87 4PQ3 0.4116 82.3 0 43 Q4 0.2401 48.0

Given that the observed and expected frequencies are fairly similar we would probably accept that the data conforms to a binomial distribution and that the samples were independently taken. We will consider how to assess statistically whether the binomial distribution provides an acceptable fit to the data in Unit 10. • By the general formula

Alternatively, we could use the general formula to calculate probabilities:

#### Worked example

We will take the same example again, namely 200 samples, each of 4 mosquitoes, from a population with a Plasmodium prevalence of 30% (P = 0.3). Note that you need to know a little about 'factorial' arithmetic for this. No. infected
mosquitoes
CalculationProbability
4  [4! × 0.34 × 0.70] [4! × 0!]
0.0081
3  [4! × 0.33 × 0.71] [3! × 1!]
0.0756
2  [4! × 0.32 × 0.72] [2! × 2!]
0.2646
1  [4! × 0.31 × 0.73] [1! × 3!]
0.4116
0  [4! × 0.30 × 0.74] [0! × 4!]
0.2401

Calculating probabilities is relatively straightforward for small samples, but for more than 20 or 30 observations the arithmetic becomes awkward. For example, 10! = 3,628,800 = 3.2 × 106, but 50! ≅ 3.041 × 1064. Handling numbers this large presents serious difficulties, even for a computer - and they are often calculated as log factorials, and handled as log probabilities.

If the sample is large (n > 25), and P ≅ 0.5 the binomial distribution is approximately normal in shape. Proportions closer to 0 or 1 yield skewed distributions, which can be normalized using an appropriate transformation. Distributions of extreme proportions (npq < 10) can be estimated using the Poisson distribution.   • ### Expected frequencies of Poisson distribution

Here we use the general formula for the Poisson distribution to calculate probabilities:

#### Worked example

We will take an example given by Glynn & Buring (1996) on hospital admissions among white Medicare patients aged 65-99 in 1989 in Jackman, Maine, USA.

First we estimate the mean number of admissions as:
Mean = [0×133 + 1×20 + 2×5 + 3×2 + 4×1 + 5×1] / 162 = 0.278

Then we substitute this value in the formula for the Poisson distribution to estimate the expected frequency for each value of r:

 No. admissionsper individual Observed frequency Calculation Probability Expected frequency 0 133 e − 0.278 0.7573 122.7 1 20 0.278 × e − 0.278 0.2105 34.1 2 5 0.2782/2 × e − 0.278 0.0293 4.7 3 2 0.2783/6 × e − 0.278 .0027 0.4 4 1 0.2784/24 × e − 0.278 0.0002 0 5 1 0.2785/120 × e − 0.278 0.0000 0

This is one case where we would not the surprised if the data did not follow a Poisson distribution since events are unlikely to be independent. A person entering hospital once one might be expected to be more likely to have to return to hospital than other individuals (for example those suffering from chronic diseases). And indeed the distribution does appear to differ from a Poisson, with an excess of zero and multiple admissions, and a deficit of single admissions.

 Except where otherwise specified, all text and images on this page are copyright InfluentialPoints, all rights reserved. Images not copyright InfluentialPoints credit their source on web-pages attached via hypertext links from those images.