Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)



Confidence interval of mean: Use & misuse

(inferential statistic, assumptions, random sampling, normal distribution)

Statistics courses, especially for biologists, assume formulae = understanding and teach how to do  statistics, but largely ignore what those procedures assume,  and how their results mislead when those assumptions are unreasonable. The resulting misuse is, shall we say, predictable...

Use and Misuse

The normal approximation estimate of the confidence interval of the mean is used widely in the scientific biologic literature, although much less by medical researchers than by other groups. Like the standard error, the confidence interval is an inferential statistic - not a descriptive statistic. As such it should only be used if certain assumptions (random sampling and normal distribution) are met. If those assumptions are met, the confidence interval of a mean is that range which would enclose the true parametric mean on a given proportion of occasions were it to be estimated repeatedly. If those assumptions are not met, the interval provides only an ill-defined index of reliability.

Not surprisingly, therefore, the commonest misuse of the confidence interval for the mean is to attach it to means derived from non-random samples. Under such circumstances it would be much more informative if researchers presented their data as means with standard deviations (if symmetrically distributed) or otherwise as medians with interquartile ranges and outliers. We give examples of where this would be especially desirable, for example following the course of haemoglobin and white cell counts after treatment for malaria.

Another frequent misuse of normal approximation confidence intervals to the mean is to use them for ordinal variables  where medians with interquartile ranges would be much more appropriate. We give examples of such intervals being calculated for standard of living scores and pain scores - neither of which are measurement variables. The same point applies for when the means are neither normally distributed nor homoscedastic. This applies especially for small sample sizes of skewed distributions such as trap catches of tsetse flies, net catches of trout and time periods when birds were disturbed. Use of such intervals in this situation can lead to nonsensical values, such as negative values for species richness or negative numbers of trout. Bootstrap confidence intervals  intervals may be appropriate here. Another misuse is to attach a confidence interval to an estimator that is known to be biased - for example the estimated numbers of horses and horse owners in Britain.

Confidence intervals should not be used to compare treatment means by simply observing whether the intervals overlap. It is true that if the intervals do not overlap, you can infer the means are significantly different. But if the intervals do overlap you cannot assume the reverse. To obtain a precise test you need to attach a confidence interval to the difference between the means and see if it overlaps zero. However, it is not recommended to use confidence intervals to perform a null hypothesis significance test  - it is better to use it as a measure of the range of values that the treatment effect may take.


What the statisticians say

Armitage & Berry (2002) give a good introduction to the two different definitions of a confidence interval in Chapter 4 on inference. Woodward (2004) gives only a very brief introduction to confidence intervals. Sokal & Rohlf (1995) provides a conventional treatment of confidence intervals - again with little emphasis on their assumptions. Bart (1998) provides a good discussion of the relative merits of tests and confidence intervals - although perhaps with too great a readiness to assume that central limit theorem will always normalize distributions and homogenize variances.

Curran-Everett (2009) explores the underlying concept of a confidence interval. Alf & Lohr (2007) stress the importance of the 'random sample' assumption for the simple normal confidence interval. Cumming & Finch (2005) , Cumming et al. (2004) and Belia et al. (2005) look at researchers' understanding of confidence intervals and standard error bars. Wood (2004) , (2005) provides a readily accessible introduction to bootstrap confidence intervals. Sim & Reid (1999) and Poole (2001) discuss what should be the role of P-values and confidence intervals in the interpretation of scientific results.

Wikipedia provides sections on the confidence interval and robust confidence intervals   NIST/SEMATECH e-Handbook of Statistics explains how to estimate the confidence limits for a mean. provides an tutorial on how to calculate normal approximation confidence intervals using R (although note the terminology about intervals is misleading).