InfluentialPoints.com
Biology, images, analysis, design...
Use/Abuse Stat.Book Beginners Stats & R
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

 

 

Beginners statistics: averages

Example, with R,  Definition and Use,  Simple formula,  Tips and Notes,  Test yourself,  References  Download R  R is Free, very powerful, and does the boring calculations & graphs for scientists.

Example, with R

Sometimes the average, or most typical value, is very obvious and straightforward:

-2   -1   0   0   0   0   0   1   2

In this case the average, 0:

  • is the most common value (the mode), and would be the value most likely to be selected by chance,
  • is the least deviant (middle-ranking, median) value,
  • is midway between the most extreme values,
  • is their mean value (their sum, divided by the number of values),
  • is the value from which all the others deviate least.

    You can check this last point yourself with 



Definition and Use

  • An average is assumed to be the most typical, usual, normal, expected, representative value of a set.
  • The average is assumed to be a simple-to-interpret robust measure of 'location', or expected outcome.
  • In statistics the average most often refers to the (simple arithmetic) mean.
  • But it is sometimes assumed to be the median or mode, or even the minimum or maximum.
  • Many other measures of location, such as 'trimmed' means and geometric means, are also known as averages.
  • In real life these measures seldom agree.
  • In other words, 'average' is a loose, ill-defined term. So beware.


Simple formula

Assuming y is a list of items, perhaps the simplest way to assess the average is to select an item at random:

y(i=random)

But, where y is a list of n numbers, the arithmetic mean is more popular:

the sum of y/n


Tips and Notes

  • Even if you assume an average is the arithmetic mean, there are many different ways of calculating that value.
  • If the number of values (n) is infinitely large, as is assumed by many statistical models, you cannot calculate their average by adding them up and dividing by infinity!
  • The best measure of location depends upon what is being averaged, and to what use you wish to put that average.
  • Even when an average is a simple and reasonable measure of location, it can be highly misleading.


Test yourself

We can illustrate some of the beginners statistical issues raised above with 


Useful references

Huff, D. (1954). How to lie with statistics. Victor Gollancz, London. Full text 
This gives the topic a lighter treatment emphasizing the problems with the misuse of the arithmetic mean for heavily skewed data. Although very old, this book is still worth looking at. Recommended!


Wikipedia: Average. Full text 
Notes that the most appropriate statistic to use for a measure of central tendency depends on the nature of the data.