#### Principles

We previously discussed outlier detection and rejection in Unit 2 We stress again that the important thing is to find out why a particular point is an outlier. It should then only be removed if it is a clear error. It should not be removed simply because it does not fit the researchers preconceptions of what the data should look like.

Nevertheless, we give some details here of **Chauvenet's criterion**, a long established method based on probability theory that is widely used in government, universities and industry for outlier detection - and deletion. It has the disadvantage (unlike methods previously considered ) that it assumes that data are from a normal distribution - always a very questionable assumption! If this is assumed, then outliers are identified based on the mean and standard deviation of the data.

#### Practice

The mean and standard deviation of the observed data (including any suspected outliers) are calculated. The (two-sided) probability of a value as (or more) extreme than the suspected outlier being taken from a normal population with the observed values of mean and standard deviation is estimated from the normal distribution function. This probability is then multiplied by the number of observations - if this comes to less than 0.5, then by Chauvenet's criterion the outlier can be discarded.

Say we have the following observations: 13,26,8,25,36,92,14,17. The mean is 28.875 and the standard deviation is 27.016. The value of 92 is suspected to be an outlier. The probability of getting a value that deviates from the mean by that much is 0.0195
Since we have eight observations, we multiply the probability by 8 to give 0.156. Since this is less than 0.5, by Chauvenet's criterion the outlier can be discarded.

#### Other approaches

Chauvenet's criterion has been criticised because it uses an arbitrary assumption - namely that a measurement may be rejected if the probability of obtaining the deviation from the mean for that value is less than the inverse of twice the number of measurements. Ross (2003) advocates use of Peirce's criterion instead, and gives full details of how to use it. This method does not make use of an arbitrary assumption to identify outliers - albeit it still assumes the data follow a normal distribution.

Another technique known as Grubbs' test for outliers (Grubbs (1969)) uses a significance test to assess whether there are any outliers in the data set.