Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)




Estimating sample size

for a given confidence interval to a proportion


The principle here is exactly the same as for estimating sample size for a given confidence limit for a mean

Again it is assumed that you are using random sampling.

In this case you only require an estimate of the proportion with the character of interest (that is, the prevalence in the case of a disease). Again the formula for confidence limits is rearranged to estimate the number of samples (n) required for the desired confidence limit.


  1. Decide on how large an error you can tolerate in your estimate, and express this as a proportion.

  2. Rearrange the formula for confidence limits to give the number of samples required for that allowable error (L) at the 95% confidence level. We have shown this below for a simple random sample ignoring the finite population correction:

Hence -
L   =   1.96 √pq
So -
n  =   (1.96 )2pq
  • L is the allowable error in the variable being measured,
  • p is the proportion with the character of interest,
  • q is the proportion without the character of interest,
  • n is the required sample size (number of observations).

The same principles apply for estimating sample size for more complex sampling plans using stratification or clustering. However, since the mean and variance of your population are likely to vary (sometimes quite considerably), your estimation of the required sample size will only be an approximation. Hence it is often recommended that the simple formula here is used to give a lower limit to the required sample size, and that figure is increased by 10-20 % if more complex designs are used.

Worked example

A survey is being carried out to estimate the prevalence of trypanosomiasis in cattle in a district of Tanzania. It is planned to use simple random sampling for the survey.

A preliminary sample of 100 animals gives the proportion infected as 0.32. If we wish to estimate our prevalence ± 5%, the value of L is 0.05.

n  =   (1.96)2 × 0.32 × 0.68  =   334

Hence we would need to take a sample of 334 animals to provide our required level of precision