Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
Skewness and Kurtosis
Skewness and kurtosis provide quantitative measures of deviation from a theoretical distribution. Here we will be concerned with deviation from a normal distribution.
In everyday English, skewness describes the lack of symmetry in a frequency distribution. A distribution is right (or positively) skewed if the tail extends out to the right - towards the higher numbers. A distribution is left (or negatively) skewed if the tail extends out to the left.
In statistics, skew is usually measured and defined using the coefficient of skew,
As you might expect, because the coefficient of skew uses the cubed deviation from the mean, skew can be either positive or negative. A symmetrical distribution has zero skew - paradoxically however, a zero skew does not prove distribution is symmetrical!
For large samples of some variable, Y, the coefficient of skew (γ1) can be estimated using this formula:
Unfortunately, the formula above provides biased estimates of γ1 when calculated from small samples of skewed populations. The formula below provides a less biased estimate.
For a normal population and large samples (n > 150), g 1 is approximately normally distributed with a mean of 0 and a standard error of √(6/n).
For very small samples of highly skewed populations even this formula is expected to underestimate its true value - in other words, |E(g1)| < |γ1|. In that case simulation modelling is the only way to get an unbiased estimate - or to estimate how it might vary.
Kurtosis is often described as the extent to which the peak of a probability distribution deviates from the shape of a normal distribution (if it is more pointed the distribution is leptokurtic, if it is flatter it is platykurtic). Since 'outlying values' are the most influential, a more useful way to regard kurtosis is in terms of tail length (if the tails are longer than expected it is platykurtic, if shorter it is leptokurtic). The coefficient of kurtosis (γ2) is the average of the fourth power of the standardized deviations from the mean.
For a normal population, the coefficient of kurtosis is expected to equal 3. A value greater than 3 indicates a leptokurtic distribution; a values less than 3 indicates a platykurtic distribution. For the sample estimate (g2), 3 is subtracted so that a positive value indicates leptokurtosis and a negative value indicates platykurtosis.
For large samples of some variable, Y, the coefficient of kurtosis (γ2) can be estimated using this formula:
This formula provides biased estimates when calculated from small samples of kurtotic populations. The formula below provides a less biased estimate of γ2.
For a large samples
The terminology of the coefficients of skew and kurtosis, along with the mean and variance, are complicated somewhat because they involve what are known as 'moment statistics'. A few words of explanation may help to reduce this confusion.
As you might expect, statisticians have developed quite a few 'tests' of normality, most of which we describe once you have enough background information to understand their reasoning. But let us give one 'plug-in formula' here and now.
A test of normality recommended by some authors is the Jarque-Bera test. This is based on the distribution of a combined measure of skewness and kurtosis.
The statistic J has an asymptotic chi-square distribution with two degrees of freedom. However, convergence to this distribution is slow and