 
Expected frequencies of binomial distribution
 By expansion of (P + Q)^{n}
Expected frequencies for the binomial can be obtained by expanding the expression (P + Q)^{n}. This is straightforward, but rather tedious for large values of n. Each term of the expansion describes the frequency of a class, each of which corresponds to the probability of finding n, n − 1, n − 2 ... 0 of the observations positive. For example, for 3 observations we expand (P + Q)^{3} to give P^{3} + 3P^{2}Q+ 3PQ^{2} + Q^{3}, where the terms predict the probability of finding three, two, one or zero positive observations. For any sample size (n), there are n + 1 possible classes, corresponding to between zero and n of the observations being positive.
For larger values of n you can work out this series of terms in two stages.
 The powers of P & Q 
For any sample size (n) there will be (n+1) terms, each of which corresponds to r successes. Each term in the series is calculated as P^{n − r}Q^{r}, like this:
P^{n − 0}Q^{0} + P^{n − 1}Q^{1} + P^{n − 2}Q^{2} + P^{n − 3}Q^{3} . . . P^{n − n}Q^{n} = 1
Their coefficients 
Each of these terms has to be multiplied by a coefficient to allow for the differing number of ways of achieving it. These coefficients can be derived pragmatically using 'Pascal's triangle', part of which is given here.
Worked example
Let us assume we have taken 200 samples, each of 4 mosquitoes, from a population with a Plasmodium prevalence of 30% (P = 0.3). The observed frequencies of samples with 4,3,2,1 and 0 infected mosquitoes are given below. We then work out expected frequencies assuming a binomial distribution:
No. infected mosquitoes  Observed frequency  Term of expansion  Probability  Expected frequency 
4  3  P^{4}  0.0081  1.6 
3  19  4P^{3}Q  0.0756  15.1 
2  48  6P^{2}Q^{2}  0.2646  52.9 
1  87  4PQ^{3}  0.4116  82.3 
0  43  Q^{4}  0.2401  48.0 
Given that the observed and expected frequencies are fairly similar we would probably accept that the data conforms to a binomial distribution and that the samples were independently taken. We will consider how to assess statistically whether the binomial distribution provides an acceptable fit to the data in Unit 10.


Alternatively, we could use the general formula to calculate probabilities:
Calculating probabilities is relatively straightforward for small samples, but for more than 20 or 30 observations the arithmetic becomes awkward. For example, 10! = 3,628,800 = 3.2 × 10^{6}, but 50! ≅ 3.041 × 10^{64}. Handling numbers this large presents serious difficulties, even for a computer  and they are often calculated as log factorials, and handled as log probabilities.
If the sample is large (n > 25), and P ≅ 0.5 the binomial distribution is approximately normal in shape. Proportions closer to 0 or 1 yield skewed distributions, which can be normalized using an appropriate transformation. Distributions of extreme proportions (npq < 10) can be estimated using the Poisson distribution.
Expected frequencies of Poisson distribution
Here we use the general formula for the Poisson distribution to calculate probabilities:
Worked example
We will take an example given by Glynn & Buring (1996) on hospital admissions among white Medicare patients aged 6599 in 1989 in Jackman, Maine, USA.
First we estimate the mean number of admissions as:
Mean = [0×133 + 1×20 + 2×5 + 3×2 + 4×1 + 5×1] / 162 = 0.278
Then we substitute this value in the formula for the Poisson distribution to estimate the expected frequency for each value of r:
No. admissions per individual  Observed frequency  Calculation  Probability  Expected frequency 
0  133  e^{ − 0.278} 
0.7573  122.7 
1  20  0.278 × e^{ − 0.278}  0.2105  34.1 
2  5  0.278^{2}/2 × e^{ − 0.278}  0.0293  4.7 
3  2  0.278^{3}/6 × e^{ − 0.278}  .0027  0.4 
4  1  0.278^{4}/24 × e^{ − 0.278}  0.0002  0 
5  1  0.278^{5}/120 × e^{ − 0.278}  0.0000  0 
This is one case where we would not the surprised if the data did not follow a Poisson distribution since events are unlikely to be independent. A person entering hospital once one might be expected to be more likely to have to return to hospital than other individuals (for example those suffering from chronic diseases). And indeed the distribution does appear to differ from a Poisson, with an excess of zero and multiple admissions, and a deficit of single admissions.
{Fig. 7}


