 
The importance of normality
"When the only tool you have is a hammer, every problem begins to resemble a nail."
Abraham Harold Maslow (1966) 
 
The idea of 'normally' distributed data (or normal 'errors') is central to statistical analysis, partly because normal distributions have properties which permit all manner of shortcuts  both conceptually, mathematically and arithmetically.

For instance:
 Any normal distribution can be defined by just two numbers, its mean and standard deviation. When calculated from random samples of a normal 'population' of numbers these statistics provide excellent estimates of that population's mean and standard deviation (known as the population parameters).
 All normal distributions are symmetrical, infinite, unbounded, and untied. This enables you to express dispersion as the mean +/ just one number  be it the standard deviation or standard error. Conventional confidence limits extend this reasoning to the summary statistic's distribution, and its population parameter.
 If you sample a nonnormal population, the ('sample') distribution of their means 'converges' to normal  the larger the samples the more normal are their means. Of course, for these means to be completely normal, the samples must be infinitely large  so means of relatively modest samples are only approximately normal.
 Owing to its intimate mathematical relationship with (for example) the F distribution, sophisticated analyses could be reduced to relatively simple formulae, provided you could compare their endresult to the correct preprepared table of numbers. This was hard work for mathematical statisticians, but the enduser merely had to know how to do the arithmetic, not why it worked, nor what it assumed  or even what the endresult meant!
"Theory is often just practice with the hard bits left out."
J. M. Robson, in 'The Library' (1985) VI.7 
Before personal computers became cheap and commonplace, even the most trivial calculation was apt to require laborious mathematical tables. So those simplifying properties were seen as a huge blessing  not least because they enabled workers to reduce large amounts of data to a few summary statistics: the mean, standard deviation, standard error, significance / nonsignificance, or a confidence interval. Since printed paper is relatively expensive, scientific journal editors came to believe entire studies could, and indeed should, be summarised using a few P values or confidence intervals.
As a result of those useful mathematical properties, mathematical statisticians were able to devise a variety of useful summary statistics, and an even greater set of 'parametric' tests and confidence intervals for those statistics  all of which assume you have random samples of 'normal' data. In the absence of more usable tools, these were seized upon by 'quantitative' scientists, and their reasoning is now the foundation of many (if not all) 'basic' stats courses and textbooks. All of which influenced our basic thinking in very fundamental ways.
The normal aftermath
Whilst much beloved by many, the normal model of a parametric universe has left us some very unpleasant and intractable legacies.
For example:
 Despite initial hopes, it turns out that virtually no real data ever represents a normal population  indeed many are not remotely so. Also, when calculated from longtailed (e.g. skewed or leptokurtic), discrete (i.e. stepped), or otherwise nonnormal data, those standard statistics loose their desirable properties, becoming imprecise if not horribly biased  and their tests or confidence intervals are similarly misleading.
 Surprisingly few users of these statistical methods know what their analyses are assuming, and there is virtually no awareness of what to expect when those assumptions are not met. Therefore surprisingly few people check their data for normality, or are prepared to explore it in any way  other than by deriving the standard set of summary statistics, mentioned above.
 Because dispersion is often assumed to only dictate how reliable a result is, many biologists assume their treatments only affect differences between means  and analyse their results accordingly. Treatments which primarily affect variation, or change skew, or kurtosis, are routinely classed as 'nonsignificant' using the standard analyses  hence, by convention, are not discussed further. Conversely, when 'divergent' values are noted, many commonlyused 'outlierrejection' procedures simply and automatically exclude them  which was why satellite monitoring failed to notice a large but unexpected hole in the Earth's ozone layer.
 Among the less mathematical sciences (such as biology and medicine) there is a growing tendency towards prescriptive study designs and analysis  where any opportunity for judgment or exploration by the researcher is carefully excluded. At the same time a truly frightening proportion of studies ignore even the most basic of assumptions  then subject the results to sophisticated (albeit wholly invalid) analyses.
"The conventional view serves to protect us from the painful job of thinking."
John Kenneth Galbraith 
Some results thereof
In some ways abundant inexpensive but powerful computers and statistical software have exacerbated, rather than reduced, our problems. On the other hand, this computational power enables statisticians to explore how their analyses perform when some of their assumptions are violated  and to develop more 'robust' approaches. Furthermore, where the standard assumptions cannot be met, computationallyintensive simulations (such as 'bootstrap' and 'MonteCarlo') are increasingly used to analyse real data  albeit largely by statisticians.
All of which ignores the role of socalled, 'nonparametric' or 'distributionfree' analyses.
Many researchers seem to assume nonparametric methods make no assumptions whatsoever. In fact they allow you to escape some, but not all of the assumptions required by parametric procedures. Specifically, you do not have to assume your data (actually its 'errors') are a random sample of a normal distribution. In other respects conventional nonparametric analyses often employ the same reasoning as parametric statistics. For instance, aside from 'exact', 'smallsample' tests, nonparametric statistics are tested and have intervals calculated using criteria developed for statistics with smooth sampling distributions  in other words parametric statistics, assuming their assumptions are met.
The full implications of this are only slowly being accepted by the scientists or statisticians.
For instance:
 Nonparametric statistics, such as the median, have rather different properties from the simple mean. In a normal universe both statistics provide estimates of the same parameter because any normal population is symmetrical, and its mean and median are identical  albeit the 'sample' median provides a more variable estimate. For strongly nonnormal populations neither of these statements may be true. Simply substituting a nonparametric test (or interval) for a parametric one therefore tends to produce results which are not interpretable as a simple 'difference between means'. Then again, nonparametric analyses generally have appreciably less power than their parametric equivalent  assuming, that is, the parametric assumptions are met (which is all too seldom).
 Many nonparametric statistics have strongly discrete distributions, even when calculated from 'large' samples. Applying (conventional) inference rules to such statistics can produce highly conservative results  in other words, it becomes virtually impossible to obtain a statistically 'significant' outcome. Rankbased approaches, such as socalled 'midP values', perform identically to conventional P values when used in parametric procedures whose assumptions are met, but do not bias the outcome for discrete statistics in the same way  they can be conservative or liberal, but 'on average' approach their nominal properties much better than the conventional measure. That midP values are seldom used is only partly because of conservatism among the scientific establishment. Bizarrely, the requirement for 'strict conservatism' in medical studies has become enshrined in law.
 Standard 'parametric' statistics (including the mean and t statistic) have stepped sampling distributions when calculated from discrete data. Proportions, for instance, are means of binary data (zeroes and ones) and have a strongly stepped distribution  even when calculated from largish samples (unless the zeroes and ones are equally common). Assuming those proportions are 'approximately normal' produces symmetrical, but very illbehaved and misleading confidence limits. Similar problems apply to statistics such as oddsratios, when calculated from strongly unequal frequencies.
In conclusion
 It could be argued that, had we not decided that most data were normal, but had opted for rankbased reasoning, and used simulation models instead of closedexpression formulae, modern statistics would be very different.
 As things stand, parametric models are so dominant that many people are simply unable to interpret the universe in any other way.
Nevertheless, at the risk of being labelled blasphemous, imcompetent, or delusional, we think there are considerable advantages of having statistical tools other than the normal hammer: some spanners, or a screwdriver, perhaps?
