InfluentialPoints.com Biology, images, analysis, design... 

"It has long been an axiom of mine that the little things are infinitely the most important" 

Measures of locationOn this page: Arithmetic mean Geometric & harmonic means Weighted means Running means Assumptions of means Median, mode & midrange Assumptions of median, mode & midrange Which is best?The meansThe arithmetic meanThe arithmetic mean is the sum (or total) of a set of measurements divided by the number of measurements in that set (or group, or list). It is (debatably) the most commonly used type of average. The mean is a measure of location or measure of central tendency of the distribution on its scale. The term central tendency refers to the fact that, very often in a set of observations, some values are more common than others  and the most common values tend to be similar. The most central value of a set is termed their location. There are a number of ways of calculating this location  of which the arithmetic mean is most common. The arithmetic mean gives equal weight to all of the measurements in a set. If you display the observations as a frequency histogram, the mean is the point at which the histogram would balance. You can test this for yourself by drawing a histogram on thick card, and cutting it out. In other words the mean divides the area under a histogram into equal halves. With the data set on worm lengths below, the data are more or less symmetrically distributed, so we find the mean is more or less central in the distribution. This is not the case in the next data set (based on data given by Shenoy et al. The second figure above shows the distribution of retail prices of tetracycline for veterinary application in We may conclude that if the distribution of a data set is (moreorless) symmetrical, the arithmetic mean provides a reasonable measure of location for that data set. If, however, the distribution is skewed or bimodal, the arithmetic mean may be misleading. We have only considered measurement variables above, but it is worth noting that a proportion is simply the mean of a binary variable if we denote the values of the binary variable as '0' or '1'. For example say we have 20 observations of a variable which can only take the value '0' (uninfected) or '1' (infected). If 5 individuals are infected, we have 5 1's and 15 0's. The arithmetic mean of the variable (= proportion infected) is equal to (Σ Y_{i})/n = 5/20 = 0.25.
Geometric and harmonic meansAs we have seen, the arithmetic mean may not always be the best measure of location of a distribution. If the frequency distribution is skewed to the right, the arithmetic mean is biased by the few very large numbers. In this situation, the geometric mean is (usually) a more appropriate measure of location. The geometric mean is defined as the n^{th }root of the product of the individual observations. In other words, instead of adding the observations together, you multiply them and then take the n^{th} root. We can write this as:
An easier way to calculate a geometric mean is to first transform the data by taking the logarithm of each observation. Then add the transformed values together and divide by the number of observations. The antilogarithm yields the detransformed or geometric mean.
The harmonic mean (H) may be used when analysing rates of change. It is the reciprocal of the arithmetic mean of the reciprocals of the observations.
One example of the correct use of the harmonic mean is to calculate 'average' speed. If for half the distance of a journey you travel at 60 kilometres per hour, and the other half you travel at 80 kilometres per hour, then  in one sense  the 'average' speed is the harmonic mean of the two, namely 68.6 kilometres per hour. This is because that is the speed you would have to travel if you travelled at the same speed for the whole trip.
Weighted meansIn a simple, or arithmetic mean, all of the observations have equal weight. Sometimes we wish to weight our observations according to their importance or our confidence in For a weighted mean each observation is multiplied by its weight, and the mean is divided by the total weighting applied.
Obviously if w = 1, w cancels out and this becomes the same as the arithmetic mean. Alternately, the arithmetic mean could be estimated from the number of observations in each interval of a frequency distribution.
The accuracy of this mean is set by the size of the class intervals. If we have very little confidence in some points, we might decide to give them zero weight  in other words we omit the points altogether when computing the mean. One important example of this is where the most divergent values (such as the maximum and
Running meansIt is often useful to smooth time series data to expose underlying trends. The extent to which this is true will depend on the signaltonoise ratio. If the variations (the noise) about the underlying trend (the signal) are small, then the trend will be clear. If, however, there is a low signaltonoise ratio, then it may be hard to discern if there is any real trend. The solution is to use some form of smoothing. This involves replacing the value of each observation in the list with an estimate of it based on a 'window' of observations around it; for each point in the list that window moves along the list. The reasoning behind this is that, if the observations are serially correlated, observations close to a given point in the list also have information about that point. This can be done by calculating running means. Each running mean , also known as a moving average, is calculated from an overlapping group of n values. There are two types of running means. A prior running mean is the unweighted mean of the previous n data points. This is the only type that can be used if the data is 'live'  in other words you want to produce the average for that day on that day. These are mostly used e.g. in stock markets. A running mean of this type always lags behind the latest observation. Another option more commonly used by biologists is the central running mean  the mean is taken of the day in question and of equal numbers of points both before and after that day. In this case n is always an odd number.
One modification of the simple running mean is to take repeated running means  in other words, running means of running means. This may be done in preference to running means with a larger number of points since it tends to produce a much smoother result. If the process is carried out repeatedly, the time series will eventually stabilise so that further repeats have no effect. Another modification of a simple running mean is to use weighted running means. These gives most weight to the central value and then progressively less weight to more outlying values. A popular form of weighted running means is exponential smoothing. Usually only previous observations (or the current and previous observations) are used to produce a smoothed value. Each point is calculated as a weighted average of all smoothed preceding observations.
Because the smoothed data are used, the weights decrease geometrically with the age of prior observations.
Assumptions of means
The median, mode and midrange
For large samples, where the frequency distribution is symmetrical, the arithmetic mean, median, mode and midrange are likely to be quite similar  on average, at least! Where the frequency distribution is skewed (such that one tail of the distribution is much the longer), the midrange will often be closest to the 'tail', followed by the arithmetic mean. The mode will be furthest from the 'tail', and the median will be in between the mode and the arithmetic mean. This can be seen in the worked example below.
Assumptions of the median, mode and midrange
Which measure of location to use?Which measure of location is most useful depends upon
Measurement variablesWhich measure of location is best also depends upon the purpose for which you wish to use the data:
Ordinal and nominal variablesIn general the median is the best measure to use for ordinal variables. This is because, for an ordinal variable, we usually cannot say that the difference between score 1 and 2 is equivalent to the difference between score 2 and 3. The mode could also be used for an ordinal variable  but remember it may not be reliable, especially if the distribution is flattened (platykurtic). The mode is, however, the only measure of location that can be used for a nominal variable. Related topics :The 'average man'Sample and parametric means
