Skewness and kurtosis provide quantitative measures of deviation from a theoretical distribution. Here we will be concerned with deviation from a normal distribution.
Skewness
In everyday English, skewness describes the lack of symmetry in a frequency distribution. A distribution is right (or positively) skewed if the tail extends out to the right - towards the higher numbers. A distribution is left (or negatively) skewed if the tail extends out to the left.
In statistics, skew is usually measured and defined using the coefficient of skew, γ1
The coefficient of skew being the average, standardized, cubed deviation from the mean.
As you might expect, because the coefficient of skew uses the cubed deviation from the mean, skew can be either positive or negative. A symmetrical distribution has zero skew - paradoxically however, a zero skew does not prove distribution is symmetrical!
For large samples of some variable, Y, the coefficient of skew (γ1) can be estimated using this formula:
Algebraically speaking -
g1 =
Σy'3/n = |
Σ(Y - )3 / n |
 |
( |
√ |
 |
Σ(Y - )2/n |
|
) |
3 |
| |
Where:
- y' = (Y -
)/sY the standardized deviation of each value from its mean.
- sY = √[Σ(Y -
)2/n], the standard deviation of Y.
- Y -
is the difference between a value of variable Y and its mean, .
- And n is the number of observations.
|
Unfortunately, the formula above provides biased estimates of γ1 when calculated from small samples of skewed populations. The formula below provides a less biased estimate.
Algebraically speaking -
g1 = |
Σy'3 |
× |
n |
 |
(n - 1) |
× |
(n - 2) |
| |
Where:
- y' = (Y -
)/sY the standardized deviation of each value from its mean.
- But sY = √[Σ(Y -
)2/(n - 1)], the sample standard deviation of Y.
- And n is the number of observations.
|
For a normal population and large samples (n > 150), g 1 is approximately normally distributed with a mean of 0 and a standard error of √(6/n).
For very small samples of highly skewed populations even this formula is expected to underestimate its true value - in other words, |E(g1)| < |γ1|. In that case simulation modelling is the only way to get an unbiased estimate - or to estimate how it might vary.

Kurtosis
Kurtosis is often described as the extent to which the peak of a probability distribution deviates from the shape of a normal distribution (if it is more pointed the distribution is leptokurtic, if it is flatter it is platykurtic). Since 'outlying values' are the most influential, a more useful way to regard kurtosis is in terms of tail length (if the tails are longer than expected it is platykurtic, if shorter it is leptokurtic). The coefficient of kurtosis (γ2) is the average of the fourth power of the standardized deviations from the mean.
For a normal population, the coefficient of kurtosis is expected to equal 3. A value greater than 3 indicates a leptokurtic distribution; a values less than 3 indicates a platykurtic distribution. For the sample estimate (g2), 3 is subtracted so that a positive value indicates leptokurtosis and a negative value indicates platykurtosis.
For large samples of some variable, Y, the coefficient of kurtosis (γ2) can be estimated using this formula:
Algebraically speaking -
g2 = |
Σy'4 |
- 3 = |
Σ(Y - )4 |
- 3 |
 |
 |
n |
( |
Σ(Y - )2/n |
) |
2 |
| |
Where:
- y' is the difference between each value and its mean,
, divided by the standard deviation, s.
- Σ(Y -
)2 / n, is the variance of Y, in other words s2
- n is the number of observations.
|
This formula provides biased estimates when calculated from small samples of kurtotic populations. The formula below provides a less biased estimate of γ2.
Algebraically speaking -
g2 = |
( |
Σy'4 × n(n+1) |
- 3(n-1)2 |
) |
× |
1 |
 |
 |
(n - 1) |
(n - 2)(n - 3) |
| |
Where:
- y' = (Y -
)/sY the standardized deviation of each value from its mean.
- But sY = √[Σ(Y -
)2/(n - 1)], the sample standard deviation of Y.
- And n is the number of observations.
|
For a large samples (n > 150) of normal population, g2 has a mean of 0 and a standard error of √[24/n]. However, its distribution does not become approximately normal unless the sample size exceeds 1000. We look at one way to assess whether skew and/or kurtosis can be regarded as statistically 'significant' below.
The terminology of the coefficients of skew and kurtosis, along with the mean and variance, are complicated somewhat because they involve what are known as 'moment statistics'. A few words of explanation may help to reduce this confusion.
- Mathematically, a moment is the mean difference between each of a set of values and a defined value - such as zero.
The ith moment is the mean of the ith power of each of those differences, and may be written as μi.
- So the mean (μ1) is the first moment about zero.
- The variance is the second moment about the mean. As a result, the variance is also known as the second central moment - and may be written as μ'2 - but, horribly confusingly, this is commonly shortened to μ2
- Similarly, the coefficient of skew, γ1 may be written as μ3 / μ23/2
And the coefficient of kurtosis, γ2 is often written as μ4 / μ22
- Covariance and Pearson's correlation coefficient are also regarded as moment statistics.

The Jarque-Bera test of normality
As you might expect, statisticians have developed quite a few 'tests' of normality, most of which we describe once you have enough background information to understand their reasoning. But let us give one 'plug-in formula' here and now.
A test of normality recommended by some authors is the Jarque-Bera test. This is based on the distribution of a combined measure of skewness and kurtosis.
Algebraically speaking -
J |
= |
n |
( |
g12 +
| g22
| ) |
 |
 |
6 | 4 |
| |
where
- J is Jarque-Bera statistic,
- n is the number of observations,
- g1 is the sample skewness,
- g2 is the sample kurtosis. Note we have previously given the formulation for g2 with 3 already subtracted from it. Hence we do not need to subtract it again here as it done by some authorities (for example Wikipedia).
|
The statistic J has an asymptotic chi-square distribution with two degrees of freedom. However, convergence to this distribution is slow and irregular
and Monte Carlo methods should be used for small samples (n < 100).