Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
for identifying outliers
We previously discussed outlier detection and rejection in
Nevertheless, we give some details here of Chauvenet's criterion, a long established method based on probability theory that is widely used in government, universities and industry for outlier detection - and deletion. It has the disadvantage (unlike methods previously considered ) that it assumes that data are from a normal distribution - always a very questionable assumption! If this is assumed, then outliers are identified based on the mean and standard deviation of the data.
The mean and standard deviation of the observed data (including any suspected outliers) are calculated. The (two-sided) probability of a value as (or more) extreme than the suspected outlier being taken from a normal population with the observed values of mean and standard deviation is estimated from the normal distribution function. This probability is then multiplied by the number of observations - if this comes to less than 0.5, then by Chauvenet's criterion the outlier can be discarded.
Say we have the following observations: 13,26,8,25,36,92,14,17. The mean is 28.875 and the standard deviation is 27.016. The value of 92 is suspected to be an outlier. The probability of getting a value that deviates from the mean by that much is
Chauvenet's criterion has been criticised because it uses an arbitrary assumption - namely that a measurement may be rejected if the probability of obtaining the deviation from the mean for that value is less than the inverse of twice the number of measurements. Ross