Example, with R
Sometimes the average, or most typical value, is very obvious and straightforward:
-2 -1 0 0 0 0 0 1 2
For that, rather unusual, set of values the average is obviously 0 - and for several different reasons:
- 0 is the most common value (the mode), and would be the value most likely to be selected by chance,
- 0 is the least deviant (middle-ranking, median) value,
- 0 is the mean of the most common values, and of the least deviant values,
- 0 is midway between the most extreme values (=the mid-range),
- 0 is their mean value (their sum, divided by the number of values),
- 0 is the value from which all the others deviate least.
You can check this last point yourself with R
- Note: for most sets of values one or more of the measures shown above will disagree.
- For example, the values -2.1, -1, 0, 0, 0, 0, 0, 1, 2 will give several different averages, depending upon how you calculate them.
Definition and Use
- An average is assumed to be the most typical, usual, normal, expected, representative value of a set.
- The average is assumed to be a simple-to-interpret robust measure of 'location', or expected outcome.
- In statistics the average most often refers to the (simple arithmetic) mean.
- But it is sometimes assumed to be the median or mode, or even the minimum or maximum.
- Many other measures of location, such as 'trimmed' means and geometric means, are also known as averages.
- In real life these measures seldom agree.
- In other words, 'average' is a loose, ill-defined term. So beware.
Simple formula
Assuming y is a list of items, perhaps the simplest way to assess the average is to select an item at random:
But, where y is a list of n numbers, the arithmetic mean is more popular:
Tips and Notes
- Even if you assume an average is the arithmetic mean, there are many different ways of calculating that value.
- If the number of values (n) is infinitely large, as is assumed by many statistical models, you cannot calculate their average by adding them up and dividing by infinity!
- The best measure of location depends upon what is being averaged, and to what use you wish to put that average.
- Even when an average is a simple and reasonable measure of location, it can be highly misleading.
Test yourself
We can illustrate some of the beginners statistical issues raised above with R
Useful references
- Huff, D. (1954). How to lie with statistics. Victor Gollancz, London. Full text
- This gives the topic a lighter treatment emphasizing the problems with the misuse of the arithmetic mean for heavily skewed data. Although very old, this book is still worth looking at. Recommended!
- Wikipedia: Average.
Full text
- Notes that the most appropriate statistic to use for a measure of central tendency depends on the nature of the data.