 
Normal probability functions
A function is simply a formula which calculates a number from one or more other numbers. In theory at least, there are three ways of relating the value of a normal observation to the probability of obtaining it. Of these, the last is the simplest to calculate and the most familiar, but is the least used. Since, in the case of a normal population, it is also the most awkward to explain, let us deal with the simpler  more useful  methods first.
 The cumulative probability function
This is defined as the proportion, P, of the population X which is less than or equal to a given value, x  where x is any possible value of X between plus and minus infinity. It may also be referred to as the normal distribution function
Given that the highest proportion is 1, and each observation of a normal population is unique, the proportion of that population greater than x is of course 1 − P.
{Fig. 4}
Notice that this function does not describe the probability of observing value x, but the probability of observing any value less than or equal to x. As a result, the cumulative normal distribution function is sometimes described as a normal integral function.
Today, most software packages use a cumulative (or integrated) normal function formula, which returns (more or less) the exact probability for any standard normal deviate. Unfortunately, there is no exact formula for converting standard normal deviation to cumulative probability. But there are approximations, and a good one is given below.
Algebraically speaking 
The cumulative standard normal distribution is the probability that a number, arising from a standard normal distribution, lies between − ∞ and x.
This probability may be approximated by:
P = Φ(x) = 1 − Z(x)×(b_{1}t + b_{2}t^{2} + b_{3}t^{3} + b_{4}t^{4} + b_{5}t^{5}) + ε
Where:
 Φ(x) is the cumulative probability for x
 t = ^{1} / (1 + px) ; and p = 0.2316419
 Z(x) is the probability density function of x, for a standard normal distribution, as given below.
 π ≈ 3.141592654 ; e ≈ 2.718281828
 b_{1} = 0.319381530, b_{2} = − 0.356563782, b_{3} = 1.781477937, b_{4} = − 1.821255978, b_{5} = 1.330274429
 ε is the error in estimating P. Its modulus, ε, is always less than 7.5×10^{ − 8} (0.000000075).

This rather horrific formula takes quite a while to do on a calculator. If you decide to make it into a computer program, use 'double precision' variables to avoid rounding errors.
 The inverse probability function
This does the opposite to the cumulative function, and estimates the value of x at or below which a certain proportion P of the population lies.
{Fig. 5}
Like the cumulative normal function, there is no exact formula for converting cumulative probability to standard normal deviates. Below is an approximation from the same source as the one above.
Algebraically speaking 
The value x, is the deviation from the population mean, at or below which a proportion, P, of a standard normal population lie. By definition, if P = 0.5 then x = 0; if P = 0 then x = − ∞; and if P = 1 then x = ∞ For all other values, x for a given P may be approximated by:
x = Φ^{ − 1}(P) = t − 
c_{0} + c_{1}t + c_{2}t^{2} 
+ε 

1 + b_{1} + b_{2}t^{2} + b_{3}t^{3} 
where Φ^{ − 1}(P) is the inverse normal function of P
 t = √[ln(^{1}/_{(1 − P}2_{)}]
 c_{0} = 2.515517, c_{1} = 0.802853, c_{2} = 0.010328
 b_{1} = 1.432788, b_{2} = 0.189269, b_{3} = 0.001308
 ε, is less than 4.5×10^{ − 4
}

 The probability density function
This is the function whose formula you will find in most statistics textbooks.
Where X is normal, because the population is infinite and every value is unique, the probability of obtaining an observation precisely equal to x is vanishingly small. Nevertheless it is possible to calculate the relative probability of obtaining x. Mathematically this probability density, Z, is equivalent to the slope of the cumulative distribution  in other words the rate of increase of P for a given x.
{Fig. 6}
The normal probability density function is often confused with the normal distribution function, or is assumed to provide the probability of observing some value, x. In fact this function only approximates the probability of observing a value within a vanishingly small range about x. As a result, when considered over a finite interval, that value can be very much greater than 1  which a probability, by definition, cannot exceed.
The equation for the normal probability density function is given below:
Algebraically speaking 
Z = 
1 
e 
− (x − μ)^{2} / (2σ^{2}) 
σ√(2π) 

where:
 Z
describes the relative heights of different parts of the normal distribution curve and is known as a probability density function.
 x is each value for which you are trying to estimate the curve's height.
 μ (pronounced 'mew') is the population mean,
 σ (pronounced 'sigma') is the population standard deviation
 π (pronounced 'pie' ≈ 3.142) is constant,
 e is the 'Euler' constant (≈ 2.718)  the base of 'natural' or 'Naperian' logarithms,

This density function is somewhat simplified for the standard normal probability density:
Algebraically speaking 
Z = φ(x) = 
1 
e 
− x^{2}/2 
√(2π) 

where:
 Z, π, e and x were defined above.

Graphical methods of testing for normality
Fitting a normal distribution
Many packages 'fit' frequency curves to graphs of observations, based upon their mean and standard deviation, and using a mathematical 'normal' function. However, the exact details of how this works depends upon how you express their distribution. Let us deal with the simplest first.
 Ranked, cumulative distributions
We take the example of the 1881 PCV values that we have used previously. Relative rank of each observation is plotted against its observed value. The smallest relative rank is therefore 1/1880, and the largest is 1881/1881, or 1. Each blue point indicates the proportion of our observations equal to, or less than, X. The green line is the fitted cumulative normal distribution of X, for a population with the same mean and standard deviation as our sample, namely 26.12 and 4.381. The green line is continuous because the computer works out the result of this formula at all points along the Xaxis (or a sufficient number to produce a realisticlooking result).
{Fig. 7}
In this particular case, because we assume PCV is a continuous variable, that has been rounded to the nearest 1%, using the sequential rank distorts your comparison of these data with the fitted line. In this particular case therefore, the mean rank provides a better measure of the cumulative distribution.
Note that if you plot your observations as rank, rather than relative rank, the computer would have to multiply the expected proportion by the total number of observations (in this case, that would be p × 1881). If you wished to plot the rank as percentiles, it should calculate the probability as a percentage (p × 100).
 Cumulative distributions by class
Fitting a normal distribution to data presented as cumulative classintervals is done the same way as for a scatterplot (above). In other words, the cumulative normal probability ( P ), of your observed mean and standard deviation, is multiplied by the number of observations in your sample ( N ), or by 100 (for percentages).
{Fig. 8}
If you want to predict the proportion of observations in each classinterval (shown in green on the second graph of the set above), you use the cumulative normal distribution to work out the proportion at the classinterval's upper and lower boundaries  and find the difference between them. For example, if its upper boundary was the mean, and the lower was minus infinity, you would expect to find 0.5  0 of the observations in that interval.
Of course, you could also replot the predicted proportion of observations in each class as an ordinary histogram. However, fitting a normal distribution to a histogram is a little more complicated.
 Histograms
To fit a normal distribution curve to a histogram of n observations, you need to convert the probability density function to a frequency. To do this you multiply it by n. In which case the area under the curve is equal to n, rather than 1. The peak value should be n/σ√2π  because, at the peak x − μ = 0, and 0/2σ^{2} = 0, so e^{0} = 1.
Aside from the information required above, the computer also has to rescale its predictions to allow for the width of class intervals used. This is because, the narrower the classinterval, the fewer observations it is likely to end up with. For example, see the graph set below.
{Fig. 9}
The only awkward bit in all of this is that for a histogram, the normal probability 'density' function does not yield a probability as we explained above. Instead,
it yields the change in probability at each value of X. This does not cause any major problems providing your class intervals are less than the standard deviation. But if your class intervals are more than about 2.5 times the standard deviation, your relative frequency can exceed 100%, as shown in the second figure above. This causes some confusion because, by definition, the relative frequency of observations cannot exceed 100%.
You may also have problems if you try to fit a normal distribution to a histogram of a discrete variable. Because discrete variables cannot have fractional values, many packages tend to produce very odd results.
Fitting a normal distribution with


Probabilityprobability (PP) and Quantilequantile (QQ) plots
Unlike the methods above, which 'fit' a normal distribution to an existing (curved) graph, PP and QQ plots are used to assess normality by plotting 'like against like'  and seeing if the result is a straight line. Although these plots can be calculated for any mathematically 'known' population distribution, they are most commonly used to assess departures from normality.
A PP plot is a scatterplot of observed Pvalues  as estimated from their relative rank within your sample  against their theoretical Pvalues, calculated by applying the cumulative population distribution function to the value of each observation in your sample.
A QQ plot is a scatterplot of your observed quantiles  in other words the actual values of the observations in your sample  against their theoretical quantiles. Each theoretical quantile is calculated, using the inverse population function, from the Pvalue of an observation  as estimated from its relative rank. For the normal distribution a QQ plot is also known as a normal probability plot.
The relationship between relative rank, quantiles and Pvalues  plus the problems of ties, and the need to correct the relative rank when estimating population quantiles from a sample  are considered in the Quantiles More Info Page.
Note that we have always plotted observed against expected values. But sometimes expected values are plotted against observed values. Check carefully which way round they are plotted on any plot you encounter.
We will first look at how PP and QQ plots are done for a small sample. Correct to 4 significant figures, the mean ( ) of this sample was 28.33, and their standard deviation (s _{x}) was 3.448.
The table below shows the value of each of these (n=) 10 observations arranged in order of rank (x_{(r)}), their corrected relative rank (crr), their theoretical quantile, and expected Pvalue. The theoretical quantile was estimated from the inverse normal probability function using R. The expected pvalue was estimated from the cumulative normal distribution function, again using R.
rank r  x_{(r)} 
crr (r − 1/2)/n 
N^{ − 1}[crr,, s_{x}] 
N[x,, s_{x}] 
1 2 3 4 5 6 7 8 9 10 
22.899 24.314 25.497 27.383 28.382 28.383 30.084 30.766 31.569 34.071 
0.050 0.150 0.250 0.350 0.450 0.550 0.650 0.750 0.850 0.950

22.663 24.761 26.009 27.006 27.901 28.768 29.663 30.661 31.909 34.007

0.0575 0.1218 0.2053 0.3913 0.5055 0.5056 0.6940 0.7596 0.8258 0.9519 
For the PP plot, the observed probability (crr) is plotted against the probability expected for a normal distribution (). For the QQ plot, the observed quantile (x_{(r)}) is plotted against the quantile expected for a normal distribution (). These two plots are shown below:
{Fig. 10}


We now look at PP and QQ plots of a large sample, albeit with many tied observations. Below are PP and QQ plots of all 1881 PCV observations Because these data are extensively tied (see above ), we have used their mean rank.
{Fig. 11}


Some software packages offer a range of methods for correcting the relative rank of observations within your sample to provide better estimates of their population equivalents. If you are uncomfortable with these correction, and do not have access to the appropriate statistical tables, remember that it is possible to obtain rankits by simulation for any distribution for which you have the inverse probability function.
Doing PP, QQ and rankit plots with


Interpreting PP & QQ plots
Even when your sample is randomly selected from a normal population, random error can be expected to ensure there is some deviation from the 'perfect fit' line. Indeed various tests compare the observed deviation from that line with the amount you would expect, if your sample represented a normal distribution. To enable you to see what to expect from QQ, PP plots of nonnormal data, we have excluded that source of variation in the diagrams below by using rankits.
For the sake of comparison, we used 4 variables, y1, y2, y3, and y4  these being normal, skewed, platykurtic (flattened), and leptokurtic (peaked). All of them comprised the most typical locations of 90 values from populations having the same mean and standard deviation (μ=10, σ=2). The four histograms, below, show how these values were distributed.
For notes and comments click on each image.
{Fig. 12}
{Fig. 13}
