 InfluentialPoints.com
Biology, images, analysis, design...
 Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)  ### Definition and Properties

The confidence interval provides a measure of the reliability of our estimate of a statistic, whether the mean or any other statistic that we calculate from our data. It can be defined as that range which when attached to a sample statistic would enclose the true parametric value on a given proportion (1−α) of occasions when it is calculated from randomly selected samples. The confidence interval is bounded by the lower confidence limit and the upper confidence limit. A more general definition of a confidence interval is that range of possible parameters within which you would class your statistic as typical.

The confidence interval of the mean of a measurement variable is commonly estimated on the assumption that the statistic follows a normal distribution, and that the variance is therefore independent of the mean. This is known as a normal approximation confidence interval. Providing the distribution is not too skewed, central limit theorem means this assumption should be valid if your sample size is large. If the distribution is only moderately skewed, sample sizes of greater than 30 should be sufficient. The assumption will not be valid for small samples from a skewed distribution.

For a large sample (n > 30) your estimate of the standard error will be relatively unbiased. Hence you can say that 1.96 'standard errors' either side of the mean will enclose 95% of the means in that population. A common approximation is to give the arithmetic mean ± twice the standard error of the mean.

#### Algebraically speaking -

 100(1− α)% CI ( ) = ± z(1 − α/2) × SE ( ) 95% CI = ±1.96 × s √n
where:
• α is the significance level; for a 95% confidence interval α = 0.05.
• z(1 − α/2) is the 1 − α/2 percentile of the standard normal distribution,
• SE ( ) is the standard error of the mean
• s is the standard deviation of the observations,
• n is the number of observations.

#### Small sample size

For means of smaller samples, your estimate of the standard error will be biased. This will result in the confidence interval being too small. You correct for this error by using t as the multiplier rather than z.

#### Algebraically speaking -

 95% CI ( ) = ± t(1 − α/2) × s √n

where:
• t1 − α/2 is a quantile of the t distribution with (n − 1) degrees of freedom,
• α is the significance level; for a 95% confidence interval α = 0.05.
• s is the standard deviation of the observations,
• n is the number of observations.

#### Finite population correction

If your sample is comprised of more than 10% of the total population, you need to include the finite population correction. For a large sample the confidence interval is given by :

#### Algebraically speaking -

 95% CI ( )   = ± 1.96 × s × √ (1− n )  √n N
where:
• s is the standard deviation of the observations,
• n is the number of observations,
• N is the total population size.

#### Transformations

If the data are drawn from a skewed population (especially if your sample size is less than 30), or if the variance is dependent on the mean (as with proportions), alternative methods should be used. The 'traditional' solution to this is to use an appropriate transformation. The commonest are the logarithmic, square root and arcsine square root transformations. Full details can be found in the More Information Page on Transformations. Another approach is bootstrapping.    ### Assumptions and Requirements

1. The simple formulae we have given above for the confidence interval of a mean only apply if you have used simple random sampling, or if individuals have been assigned to treatments at random. We consider what to do about clustered sampling and stratification in Unit 7. 2. You have only estimated your population mean using your sample mean - you have not measured it directly.
Therefore, your confidence interval applies to the sample mean, not the population mean.

3. Ideally your data should be drawn from a normally distributed population. However, sample means of large numbers of observations tend to be distributed normally, whatever the underlying distribution. Hence the confidence interval may still be valid. But for very skewed distributions whatever the sample size, and for small samples from non-normal populations, you should first carry out an appropriate transformation of the data.

4. You have only estimated the variability of your mean - you have not measured it directly.
Although your sample means may be normally distributed, your estimate of the standard error is a sample statistic, and therefore subject to error. This error is worst for small sample sizes, and results in your estimated confidence interval being too narrow. Hence you must use the t-correction for small samples (always for n < 30; preferably for n < 100).