Example, with R
A median is the middle-ranking value of an ordered list.
- For instance, having arranged these 5 values in (ascending) rank order:
miniscule_{[1]} smallish_{[2]} biggish_{[3]} verybig_{[4]} ultrabig_{[5]}
- we can see their median value is biggish
Given the lowest rank is 1 and the highest rank is n, the middle rank is (1+n)/2 - which is also the mean rank.
- For n=5 values, the midrank is 3, whereas for n=4 values the midrank would be 2.5 - which is a bit awkward when you have no value of that rank.
Of course this assumes your values can be sensibly ranked - and that, if there are an even number of different values, that you can sensibly interpolate - or otherwise select a plausible value.
You can find the median with R
Definition and Use
- The median can be defined as the value with the least extreme rank - assuming the minimum and maximum have the most extreme ranks.
- This works well if there are an odd number of different values, less well if there are an even number of different values.
- The median can also be defined as that value which divides a sorted ranked set of values into two equal halves - or as the 50% quantile.
- This works well if there are an even number of different values, provided the mid-rank's value can be sensibly interpolated by taking the mean of the mid-ranking pair of values
Either definition may be problematic where there are an even number of values, but only certain values are possible - such as whole-numbers.
Simple formula
Provided there are an odd number of items in y - assuming y is a list of n rankable items - then y_{[r]} is the rth ranked value of y, and r is their rank:
If y has an even number of items (n), you must obtain the median value from the mid-ranking pair of values - say, using their mean.
Tips and Notes
- Whereas the simple arithmetic mean gives each item an equal weight, the median only gives any weight to the mid-ranking value(s) - in other words it is the most extreme form of trimmed mean.
- The median is especially useful as a measure of central tendency where outlying values are untrustworthy or unknown, or where distributions are very skewed. However, it is wasteful of information.
- Beware - sometimes the median is used in place of the mean without this being specified.
- This can be very misleading since the mean and median of skewed data can be rather different.
- The median may not be a useful measure, if most values are the same: as with binary data.
- The median may not be useful even for skewed data, if it is to be used to estimate a total.
- For example if the median cost per operation is £500, you cannot assume the total cost for 100 such operations will be £50,000. For this calculation you MUST know the (arithmetic) mean cost.
Test yourself
Compare the medians of these five sets of values:
When you have examined these next two sets of values, compare their median and mean.
Note -Inf denotes minus infinity.
Useful references
- Barber, J.A. & Thompson, S.G. (1998). Analysis and interpretation of cost data in randomized controlled trials: review of published studies. BMJ 317, 1195-1200 (31 October). Full text
- Notes that the arithmetic mean cost (and not the median cost) is required if it is to be used to estimate a total.
- Stark, P.B. Measures of location and spread.
Full text
- A thought-provoking account of measures of location and spread.
- Wikipedia: Median.
Full text