Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
Normal distributionOn this page: Properties & Assumptions Location Dispersion Central limit
The normal distribution, also known as the Gaussian distribution, is a theoretical continuous distribution of a random variable - and is mathematically defined by several
The normal distribution was so named because it was thought to be the natural or normal distribution for any continuous variable to follow. We now know that, in biology at least, that is not necessarily the case. But in statistics the distribution remains extremely important because it more-or-less describes the random variation of sample means - and many statistics that behave as
Whilst many people visualise a normal distribution as a 'bell-shaped' curve, for a critical appraisal you need to define its properties much more clearly and quantitatively. Let us begin by stating the properties of the distribution.
Let us consider how varying these population parameters affects the appearance of the distribution.
The population's location
Because the normal distribution is smooth and symmetrical, the mean, median, and mode of any normal population are identical. The graph below shows the distribution of 3 normal populations, whose only difference is their location.
The population mean is usually defined as the mean of all the values in that population - or μ. More explicitly, if we call our population X, μx would be the population mean. In contrast, the mean of a sample of that population, is only an estimate of the 'true' population mean, and the two are only the same on average.
Another way of defining the population mean is in terms of the average result of randomly sampling that population. For example, if x is an observation of population X then, if we took an infinitely large sample its mean would be Σx/∞ - which causes a few headaches! To avoid this dilemma, we say that, if we repeatedly sample a population, we would expect the average value of x, E(x) to be identical to the population mean, μx. In other words, μx =
If the probability of observing a value was unrelated to the value being observed the distribution would be uniform. However, if you sample a normal population at random, the most commonly observed values are closest to the population mean.
The population's dispersion
With a normal population, for mathematical reasons, the dispersion is usually defined in terms of the root mean squared deviation of its observations about their population mean - in other words the population standard deviation, σ. The graph below shows the distribution of 3 normal populations, whose only difference is their dispersion.
By convention, the standard deviation of a population called Y is generally represented by the Greek letter s - in other words σy - or just σ. The standard deviation of a sample of that population may be written as sy, or just s.
Aside from their mean and standard deviation, every normal population is identical. Therefore, if you rescale normal populations to allow for these two parameters, every normal population is completely identical. The commonest way to rescale a normal population is to subtract the mean from each observation, and divide by the standard deviation. This will produce a standard normal population, which has a mean of zero and a standard deviation of one. This is mathematically the simplest of all - and, because is so useful, the standard normal distribution has its own special symbols and terminology.
As we said above, the main reason why the normal distribution is so important in statistics is that many sample statistics, including the mean, tend towards a normal distribution, irrespective of the population distribution. The way in which a statistic's normal tendency depends upon sample size is described by what is known as central limit theorem. Non-mathematically, there are three factors which determine how large a sample you need in order to assume a statistic is approximately normal.
Another reason the normal distribution is so popular is because its properties are well known - at least to mathematicians. In particular, there are various formulae which estimate the proportion of a normal population in a defined interval. These are known as probability functions.
Related topics :
Skewness and kurtosis
Chauvenet's criterion for identifying outliers
The log normal distribution