Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)



Normal probability functions

A function is simply a formula which calculates a number from one or more other numbers. In theory at least, there are three ways of relating the value of a normal observation to the probability of obtaining it. Of these, the last is the simplest to calculate and the most familiar, but is the least used. Since, in the case of a normal population, it is also the most awkward to explain, let us deal with the simpler - more useful - methods first.

  1. The cumulative probability function

    This is defined as the proportion, P, of the population X which is less than or equal to a given value, x - where x is any possible value of X between plus and minus infinity. It may also be referred to as the normal distribution function

    Given that the highest proportion is 1, and each observation of a normal population is unique, the proportion of that population greater than x is of course 1 − P.

    {Fig. 4}

    Notice that this function does not describe the probability of observing value x, but the probability of observing any value less than or equal to x. As a result, the cumulative normal distribution function is sometimes described as a normal integral function.

    Today, most software packages use a cumulative (or integrated) normal function formula, which returns (more or less) the exact probability for any standard normal deviate. Unfortunately, there is no exact formula for converting standard normal deviation to cumulative probability. But there are approximations, and a good one is given below.

    Algebraically speaking -

    The cumulative standard normal distribution is the probability that a number, arising from a standard normal distribution, lies between − ∞ and x.

    This probability may be approximated by:

    P = Φ(x)  = 1 − Z(x)(b1t + b2t2 + b3t3 + b4t4 + b5t5) + ε

    • Φ(x) is the cumulative probability for x
    • t = 1 / (1 + px) ; and p = 0.2316419
    • Z(x) is the probability density function of x, for a standard normal distribution, as given below.
    • π ≈ 3.141592654 ; e ≈ 2.718281828
    • b1 = 0.319381530, b2 = − 0.356563782, b3 = 1.781477937, b4 = − 1.821255978, b5 = 1.330274429
    • ε is the error in estimating P. Its modulus, |ε|, is always less than 7.5×10 − 8 (0.000000075).

    This rather horrific formula takes quite a while to do on a calculator. If you decide to make it into a computer program, use 'double precision' variables to avoid rounding errors.


  2. The inverse probability function

    This does the opposite to the cumulative function, and estimates the value of x at or below which a certain proportion P of the population lies.

    {Fig. 5}


    Like the cumulative normal function, there is no exact formula for converting cumulative probability to standard normal deviates. Below is an approximation from the same source as the one above.

    Algebraically speaking -

    The value x, is the deviation from the population mean, at or below which a proportion, P, of a standard normal population lie. By definition, if P = 0.5 then x = 0; if P = 0 then x = − ∞; and if P = 1 then x = ∞ For all other values, x for a given P may be approximated by:

    x = Φ − 1(P)   = t −   c0 + c1t + c2t2  +ε
    1 + b1 + b2t2 + b3t3

    • Φ − 1(P) is the inverse normal function of P
    • t = √[ln(1/(1 − P2)]
    • c0 = 2.515517, c1 = 0.802853, c2 = 0.010328
    • b1 = 1.432788, b2 = 0.189269, b3 = 0.001308
    • |ε|, is less than 4.5×10 − 4


  3. The probability density function

    This is the function whose formula you will find in most statistics textbooks.

    Where X is normal, because the population is infinite and every value is unique, the probability of obtaining an observation precisely equal to x is vanishingly small. Nevertheless it is possible to calculate the relative probability of obtaining x. Mathematically this probability density, Z, is equivalent to the slope of the cumulative distribution - in other words the rate of increase of P for a given x.

    {Fig. 6}

    The normal probability density function is often confused with the normal distribution function, or is assumed to provide the probability of observing some value, x. In fact this function only approximates the probability of observing a value within a vanishingly small range about x. As a result, when considered over a finite interval, that value can be very much greater than 1 - which a probability, by definition, cannot exceed.

    The equation for the normal probability density function is given below:

    Algebraically speaking -

    Z =       1     e − (x − μ)2 / (2σ2)


    • Z describes the relative heights of different parts of the normal distribution curve and is known as a probability density function.
    • x is each value for which you are trying to estimate the curve's height.
    • μ (pronounced 'mew') is the population mean,
    • σ (pronounced 'sigma') is the population standard deviation
    • π (pronounced 'pie' ≈ 3.142) is constant,
    • e is the 'Euler' constant (≈ 2.718) - the base of 'natural' or 'Naperian' logarithms,


    This density function is somewhat simplified for the standard normal probability density:

    Algebraically speaking -

    Z = φ(x) =       1     e − x2/2


    • Z, π, e and x were defined above.




Graphical methods of testing for normality

Fitting a normal distribution

Many packages 'fit' frequency curves to graphs of observations, based upon their mean and standard deviation, and using a mathematical 'normal' function. However, the exact details of how this works depends upon how you express their distribution. Let us deal with the simplest first.

  • Ranked, cumulative distributions

    We take the example of the 1881 PCV values that we have used previously. Relative rank of each observation is plotted against its observed value. The smallest relative rank is therefore 1/1880, and the largest is 1881/1881, or 1. Each blue point indicates the proportion of our observations equal to, or less than, X. The green line is the fitted cumulative normal distribution of X, for a population with the same mean and standard deviation as our sample, namely 26.12 and 4.381. The green line is continuous because the computer works out the result of this formula at all points along the X-axis (or a sufficient number to produce a realistic-looking result).

    {Fig. 7}
    pcvcumf.gif from pcvcumf.stg using abcde.sta

    In this particular case, because we assume PCV is a continuous variable, that has been rounded to the nearest 1%, using the sequential rank distorts your comparison of these data with the fitted line. In this particular case therefore, the mean rank provides a better measure of the cumulative distribution.

    Note that if you plot your observations as rank, rather than relative rank, the computer would have to multiply the expected proportion by the total number of observations (in this case, that would be p  1881). If you wished to plot the rank as percentiles, it should calculate the probability as a percentage (p  100).


  • Cumulative distributions by class

    Fitting a normal distribution to data presented as cumulative class-intervals is done the same way as for a scatterplot (above). In other words, the cumulative normal probability ( P ), of your observed mean and standard deviation, is multiplied by the number of observations in your sample ( N ), or by 100 (for percentages).

    {Fig. 8}
    pcvcum.gif using pcvcum1 & 2.stg from abcde.sta

    If you want to predict the proportion of observations in each class-interval (shown in green on the second graph of the set above), you use the cumulative normal distribution to work out the proportion at the class-interval's upper and lower boundaries - and find the difference between them. For example, if its upper boundary was the mean, and the lower was minus infinity, you would expect to find 0.5 - 0 of the observations in that interval.

    Of course, you could also re-plot the predicted proportion of observations in each class as an ordinary histogram. However, fitting a normal distribution to a histogram is a little more complicated.


  • Histograms

    To fit a normal distribution curve to a histogram of n observations, you need to convert the probability density function to a frequency. To do this you multiply it by n. In which case the area under the curve is equal to n, rather than 1. The peak value should be n/σ√2π - because, at the peak x − μ = 0, and 0/2σ2 = 0, so e0 = 1.

    Aside from the information required above, the computer also has to re-scale its predictions to allow for the width of class intervals used. This is because, the narrower the class-interval, the fewer observations it is likely to end up with. For example, see the graph set below.

    {Fig. 9}
    pcvhist.gif using pcvhist & pcvci.stg from abcde.sta

    The only awkward bit in all of this is that for a histogram, the normal probability 'density' function does not yield a probability as we explained above. Instead, it yields the change in probability at each value of X. This does not cause any major problems providing your class intervals are less than the standard deviation. But if your class intervals are more than about 2.5 times the standard deviation, your relative frequency can exceed 100%, as shown in the second figure above. This causes some confusion because, by definition, the relative frequency of observations cannot exceed 100%.

    You may also have problems if you try to fit a normal distribution to a histogram of a discrete variable. Because discrete variables cannot have fractional values, many packages tend to produce very odd results.

Fitting a normal distribution with



Probability-probability (PP) and Quantile-quantile (QQ) plots

Unlike the methods above, which 'fit' a normal distribution to an existing (curved) graph, PP and QQ plots are used to assess normality by plotting 'like against like' - and seeing if the result is a straight line. Although these plots can be calculated for any mathematically 'known' population distribution, they are most commonly used to assess departures from normality.

  • A PP plot is a scatterplot of observed P-values - as estimated from their relative rank within your sample - against their theoretical P-values, calculated by applying the cumulative population distribution function to the value of each observation in your sample.

  • A QQ plot is a scatterplot of your observed quantiles - in other words the actual values of the observations in your sample - against their theoretical quantiles. Each theoretical quantile is calculated, using the inverse population function, from the P-value of an observation - as estimated from its relative rank. For the normal distribution a QQ plot is also known as a normal probability plot.

The relationship between relative rank, quantiles and P-values - plus the problems of ties, and the need to correct the relative rank when estimating population quantiles from a sample - are considered in the Quantiles More Info Page.

Note that we have always plotted observed against expected values. But sometimes expected values are plotted against observed values. Check carefully which way round they are plotted on any plot you encounter.

We will first look at how PP and QQ plots are done for a small sample. Correct to 4 significant figures, the mean () of this sample was 28.33, and their standard deviation (sx) was 3.448.

The table below shows the value of each of these (n=) 10 observations arranged in order of rank (x(r)), their corrected relative rank (crr), their theoretical quantile, and expected P-value. The theoretical quantile was estimated from the inverse normal probability function using R. The expected p-value was estimated from the cumulative normal distribution function, again using R.

x(r) crr
(r − 1/2)/n

N − 1[crr,, sx]

N[x,, sx]

For the PP plot, the observed probability (crr) is plotted against the probability expected for a normal distribution (). For the QQ plot, the observed quantile (x(r)) is plotted against the quantile expected for a normal distribution (). These two plots are shown below:

{Fig. 10}
ppqq1.gif from norm.sta

We now look at PP and QQ plots of a large sample, albeit with many tied observations. Below are PP and QQ plots of all 1881 PCV observations Because these data are extensively tied (see above ), we have used their mean rank.

{Fig. 11}

Some software packages offer a range of methods for correcting the relative rank of observations within your sample to provide better estimates of their population equivalents. If you are uncomfortable with these correction, and do not have access to the appropriate statistical tables, remember that it is possible to obtain rankits by simulation for any distribution for which you have the inverse probability function.

Doing PP, QQ and rankit plots with


Interpreting PP & QQ plots

Even when your sample is randomly selected from a normal population, random error can be expected to ensure there is some deviation from the 'perfect fit' line. Indeed various tests compare the observed deviation from that line with the amount you would expect, if your sample represented a normal distribution. To enable you to see what to expect from QQ, PP plots of non-normal data, we have excluded that source of variation in the diagrams below by using rankits.

For the sake of comparison, we used 4 variables, y1, y2, y3, and y4 - these being normal, skewed, platykurtic (flattened), and leptokurtic (peaked). All of them comprised the most typical locations of 90 values from populations having the same mean and standard deviation (μ=10, σ=2). The four histograms, below, show how these values were distributed.

For notes and comments click on each image.

{Fig. 12}


{Fig. 13}