InfluentialPoints.com
Biology, images, analysis, design...
Use/Abuse Stat.Book Beginners Stats & R
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

Search this site

 

 

Beginners statistics:

Medians

On this page: Example, with R,  Definition and Use,  Simple formula,  Tips and Notes,  Test yourself,  References  Download R  R is Free, very powerful, and does the boring calculations & graphs for scientists.

Example, with R

A median is the middle-ranking value of an ordered list.

For instance, having arranged these 5 values in (ascending) rank order:

miniscule[1]   smallish[2]   biggish[3]   verybig[4]   ultrabig[5]

we can see their median value is biggish

Given the lowest rank is 1 and the highest rank is n, the middle rank is (1+n)/2 - which is also the mean rank.

For n=5 values, the midrank is 3, whereas for n=4 values the midrank would be 2.5 - which is a bit awkward when you have no value of that rank.

Of course this assumes your values can be sensibly ranked - and that, if there are an even number of different values, that you can sensibly interpolate - or otherwise select a plausible value.

You can find the median with 



Definition and Use

  1. The median can be defined as the value with the least extreme rank - assuming the minimum and maximum have the most extreme ranks.
    This works well if there are an odd number of different values, less well if there are an even number of different values.
  2. The median can also be defined as that value which divides a sorted ranked set of values into two equal halves - or as the 50% quantile.
    This works well if there are an even number of different values, provided the mid-rank's value can be sensibly interpolated by taking the mean of the mid-ranking pair of values

    Either definition may be problematic where there are an even number of values, but only certain values are possible - such as whole-numbers.


Simple formula

Provided there are an odd number of items in y - assuming y is a list of n rankable items - then y[r] is the rth ranked value of y, and r is their rank:

the median of y is y[mean(r)]

If y has an even number of items (n), you must obtain the median value from the mid-ranking pair of values - say, using their mean.


Tips and Notes

  • Whereas the simple arithmetic mean gives each item an equal weight, the median only gives any weight to the mid-ranking value(s) - in other words it is the most extreme form of trimmed mean.
  • The median is especially useful as a measure of central tendency where outlying values are untrustworthy or unknown, or where distributions are very skewed. However, it is wasteful of information.
  • Beware - sometimes the median is used in place of the mean without this being specified.
    This can be very misleading since the mean and median of skewed data can be rather different.
  • The median may not be a useful measure, if most values are the same: as with binary data.
  • The median may not be useful even for skewed data, if it is to be used to estimate a total.
    For example if the median cost per operation is £500, you cannot assume the total cost for 100 such operations will be £50,000. For this calculation you MUST know the (arithmetic) mean cost.


Test yourself

Compare the medians of these five sets of values:

When you have examined these next two sets of values, compare their median and mean.
Note -Inf denotes minus infinity.


Useful references

Barber, J.A. & Thompson, S.G. (1998). Analysis and interpretation of cost data in randomized controlled trials: review of published studies. BMJ 317, 1195-1200 (31 October). Full text 
Notes that the arithmetic mean cost (and not the median cost) is required if it is to be used to estimate a total.


Stark, P.B. Measures of location and spread. Full text 
A thought-provoking account of measures of location and spread.


Wikipedia: Median. Full text