The importance of normality
"When the only tool you have is a hammer, every problem begins to resemble a nail."
Abraham Harold Maslow (1966) |
The idea of 'normally' distributed data (or normal 'errors') is central to statistical analysis, partly because normal distributions have properties which permit all manner of short-cuts - both conceptually, mathematically and arithmetically.
For instance:
- Any normal distribution can be defined by just two numbers, its mean and standard deviation. When calculated from random samples of a normal 'population' of numbers these statistics provide excellent estimates of that population's mean and standard deviation (known as the population parameters).
- All normal distributions are symmetrical, infinite, unbounded, and un-tied. This enables you to express dispersion as the mean +/- just one number - be it the standard deviation or standard error. Conventional confidence limits extend this reasoning to the summary statistic's distribution, and its population parameter.
- If you sample a non-normal population, the ('sample') distribution of their means 'converges' to normal - the larger the samples the more normal are their means. Of course, for these means to be completely normal, the samples must be infinitely large - so means of relatively modest samples are only approximately normal.
- Owing to its intimate mathematical relationship with (for example) the F -distribution, sophisticated analyses could be reduced to relatively simple formulae, provided you could compare their end-result to the correct pre-prepared table of numbers. This was hard work for mathematical statisticians, but the end-user merely had to know how to do the arithmetic, not why it worked, nor what it assumed - or even what the end-result meant!
"Theory is often just practice with the hard bits left out."
J. M. Robson, in 'The Library' (1985) VI.7 |
Before personal computers became cheap and commonplace, even the most trivial calculation was apt to require laborious mathematical tables. So those simplifying properties were seen as a huge blessing - not least because they enabled workers to reduce large amounts of data to a few summary statistics: the mean, standard deviation, standard error, significance / nonsignificance, or a confidence interval. Since printed paper is relatively expensive, scientific journal editors came to believe entire studies could, and indeed should, be summarised using a few P -values or confidence intervals.
As a result of those useful mathematical properties, mathematical statisticians were able to devise a variety of useful summary statistics, and an even greater set of 'parametric' tests and confidence intervals for those statistics - all of which assume you have random samples of 'normal' data. In the absence of more usable tools, these were seized upon by 'quantitative' scientists, and their reasoning is now the foundation of many (if not all) 'basic' stats courses and text-books. All of which influenced our basic thinking in very fundamental ways.
The normal aftermath
Whilst much beloved by many, the normal model of a parametric universe has left us some very unpleasant and intractable legacies.
For example:
- Despite initial hopes, it turns out that virtually no real data ever represents a normal population - indeed many are not remotely so. Also, when calculated from long-tailed (e.g. skewed or leptokurtic), discrete (i.e. stepped), or otherwise non-normal data, those standard statistics loose their desirable properties, becoming imprecise if not horribly biased - and their tests or confidence intervals are similarly misleading.
- Surprisingly few users of these statistical methods know what their analyses are assuming, and there is virtually no awareness of what to expect when those assumptions are not met. Therefore surprisingly few people check their data for normality, or are prepared to explore it in any way - other than by deriving the standard set of summary statistics, mentioned above.
- Because dispersion is often assumed to only dictate how reliable a result is, many biologists assume their treatments only affect differences between means - and analyse their results accordingly. Treatments which primarily affect variation, or change skew, or kurtosis, are routinely classed as 'nonsignificant' using the standard analyses - hence, by convention, are not discussed further. Conversely, when 'divergent' values are noted, many commonly-used 'outlier-rejection' procedures simply and automatically exclude them - which was why satellite monitoring failed to notice a large but unexpected hole in the Earth's ozone layer.
- Among the less mathematical sciences (such as biology and medicine) there is a growing tendency towards prescriptive study designs and analysis - where any opportunity for judgment or exploration by the researcher is carefully excluded. At the same time a truly frightening proportion of studies ignore even the most basic of assumptions - then subject the results to sophisticated (albeit wholly invalid) analyses.
"The conventional view serves to protect us from the painful job of thinking."
John Kenneth Galbraith |
Some results thereof
In some ways abundant inexpensive but powerful computers and statistical software have exacerbated, rather than reduced, our problems. On the other hand, this computational power enables statisticians to explore how their analyses perform when some of their assumptions are violated - and to develop more 'robust' approaches. Furthermore, where the standard assumptions cannot be met, computationally-intensive simulations (such as 'bootstrap' and 'Monte-Carlo') are increasingly used to analyse real data - albeit largely by statisticians.
All of which ignores the role of so-called, 'non-parametric' or 'distribution-free' analyses.
Many researchers seem to assume non-parametric methods make no assumptions whatsoever. In fact they allow you to escape some, but not all of the assumptions required by parametric procedures. Specifically, you do not have to assume your data (actually its 'errors') are a random sample of a normal distribution. In other respects conventional nonparametric analyses often employ the same reasoning as parametric statistics. For instance, aside from 'exact', 'small-sample' tests, nonparametric statistics are tested and have intervals calculated using criteria developed for statistics with smooth sampling distributions - in other words parametric statistics, assuming their assumptions are met.
The full implications of this are only slowly being accepted by the scientists or statisticians.
For instance:
- Nonparametric statistics, such as the median, have rather different properties from the simple mean. In a normal universe both statistics provide estimates of the same parameter because any normal population is symmetrical, and its mean and median are identical - albeit the 'sample' median provides a more variable estimate. For strongly non-normal populations neither of these statements may be true. Simply substituting a nonparametric test (or interval) for a parametric one therefore tends to produce results which are not interpretable as a simple 'difference between means'. Then again, nonparametric analyses generally have appreciably less power than their parametric equivalent - assuming, that is, the parametric assumptions are met (which is all too seldom).
- Many non-parametric statistics have strongly discrete distributions, even when calculated from 'large' samples. Applying (conventional) inference rules to such statistics can produce highly conservative results - in other words, it becomes virtually impossible to obtain a statistically 'significant' outcome. Rank-based approaches, such as so-called 'mid-P -values', perform identically to conventional P -values when used in parametric procedures whose assumptions are met, but do not bias the outcome for discrete statistics in the same way - they can be conservative or liberal, but 'on average' approach their nominal properties much better than the conventional measure. That mid-P -values are seldom used is only partly because of conservatism among the scientific establishment. Bizarrely, the requirement for 'strict conservatism' in medical studies has become enshrined in law.
- Standard 'parametric' statistics (including the mean and t -statistic) have stepped sampling distributions when calculated from discrete data. Proportions, for instance, are means of binary data (zeroes and ones) and have a strongly stepped distribution - even when calculated from largish samples (unless the zeroes and ones are equally common). Assuming those proportions are 'approximately normal' produces symmetrical, but very ill-behaved and misleading confidence limits. Similar problems apply to statistics such as odds-ratios, when calculated from strongly unequal frequencies.
In conclusion
- It could be argued that, had we not decided that most data were normal, but had opted for rank-based reasoning, and used simulation models instead of closed-expression formulae, modern statistics would be very different.
- As things stand, parametric models are so dominant that many people are simply unable to interpret the universe in any other way.
Nevertheless, at the risk of being labelled blasphemous, imcompetent, or delusional, we think there are considerable advantages of having statistical tools other than the normal hammer: some spanners, or a screwdriver, perhaps?