 InfluentialPoints.com
Biology, images, analysis, design...
 Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)  ### Arithmetic mean

####  Worked example

The table below shows the weights of thirty cattle.

 # Weight # Weight # Weight 1 445 11 450 21 475 2 530 12 500 22 545 3 540 13 520 23 420 4 510 14 460 24 495 5 570 15 430 25 485 6 530 16 520 26 570 7 545 17 520 27 480 8 545 18 430 28 495 9 505 19 535 29 470 10 535 20 535 30 490

As you can see, the distribution was a little skewed to the left, but with only thirty observations this is not surprising.

Using the individual observations, the arithmetic mean cattle weight = 445 + 530 ....+ 470 + 490 / 30 = 15080 / 30 = 502.7 kg   The arithmetic mean can also be estimated from data grouped in a frequency distribution by assuming the values are concentrated at the centre of the interval. If the distribution is symmetrical, this assumption will be valid. Frequency distribution of weights of cattle Weight Class (kg) Mid-point Freq 401-420 410.5 1 421-440 430.5 2 441-460 450.5 3 461-480 470.5 3 481-500 490.5 5 501-520 510.5 5 521-540 530.5 6 541-560 550.5 3 561-580 570.5 2

Using the grouped data, the arithmetic mean cattle weight = {(410.5 x 1) ...+ (570.5 x 2)} / 30 = 500.5 kg. Note that the value estimated from grouped data is close to, but not identical to, the value calculated from the raw data. ### Geometric mean

####  Worked example

We will take as an example data on the number of helminth eggs in a small sample of cattle faeces.

 Cow No. No. ofhelminths 1 35 2 24 3 22 4 267 5 15 6 21

The geometric mean (G ) is calculated as follows:

G = Antilog { Σ (log Y) /n }
= Antilog { (1.54 + 1.38 + 1.34 + 2.43 + 1.18 + 1.32) /6 }
= Antilog 1.532 = 34.0 The geometric mean number of helminth eggs is 34, much lower than the arithmetic mean of 64. The geometric mean gives a more representative idea of central tendency, whilst the arithmetic mean is heavily influenced by one large number. If there are any zeros on the data, the conventional approach is to add one to each observation before taking logarithms, and then subtract one from the geometric mean after detransformation. Unfortunately , this process makes the (corrected) geometric mean a biased estimator - a fact we will return to later. ### Weighted mean

#### Worked example  We will use the data from the BSDS story in Unit 1 to demonstrate calculation of a weighted mean. The table gives seroprevalence of BSDS in 11 herds, each with different numbers of cattle.

 Table I:Infection of herds with BSDS Farm No. of cattle (w) No. +ve Prevalence (%) (Y) Y.w 1 297 0 0.00 0.0 2 123 1 0.81301 100 3 245 0 0.00 0.0 4 78 2 2.56410 200 5 320 0 0.00 0.0 6 145 1 0.68966 100 7 224 0 0.00 0.0 8 266 0 0.00 0.0 9 298 0 0.00 0.0 10 320 0 0.00 0.0 11 54 1 1.85185 100 Σ 2370 5 - 500

Before we obtained overall prevalence by dividing total number of infections (5) by the total number of animals (2370), and multiplying by 100 to give a prevalence of 0.211%.

But we get exactly the same answer if we instead calculate a weighted mean prevalence from the individual herd prevalences. In the last column we multiply each herd prevalence (Y) by the number of cattle in that herd (w). The sum of this column is 500. If we divide by the total number of cattle (Σw) (2370) we again get 0.211%.  ### Median, mode and mid-range

####  Worked example

We will use the same data on the weights of 30 cattle to work out the median, mode and mid-range.

 Weights (kg) of cattle. Rank Weight Rank Weight Rank Weight 1 420 11 490 21 530 2 430 12 495 22 535 3 430 13 495 23 535 4 445 14 500 24 535 5 450 15 505 25 540 6 460 16 510 26 545 7 470 17 520 27 545 8 475 18 520 28 545 9 480 19 520 29 570 10 485 20 530 30 570

To work out the median we first rank the weights of the 30 cattle from lowest to highest. The median is the centre-most value of the ranked data - in this case mid-way between the 15th and 16th value.

Thus we can estimate the median as 507.5 kg If we were to use the raw data to calculate the mode, we would find three modes at 520, 535 and 545 kg. However the mode should normally be calculated from grouped frequency data as shown here:

With this grouping of the data the mode is at 531-550 kg, or 540.5 kg.

The mid-range is readily calculated as the value which is half way between the maximum and minimum, in this case (420+570)/2 = 495 kg. Frequency distribution of weights of cattle Weight Class (kg) Frequency 411-430 3 431-450 2 451-470 2 471-490 4 491-510 5 511-530 5 531-550 7 551-570 2

To summarize:

 Measures of location Measure of location Weight (kg) Mid-Range 495 Arithmetic Mean 502.7 Median 507.5 Mode 540.5 ### Running means and medians

####  Worked example

The data shown here represents the number of a species of butterfly observed each week along a transect.

 Number of a butterfly species observed on a transect Wk No. Wk No. Wk No. Wk No. 1 12 14 32 27 31 40 22 2 30 15 45 28 27 41 1 3 25 16 33 29 12 42 24. 4 15 17 48 30 26 43 27 5 12 18 35 31 29 44 21 6 25 19 44 32 41 45 29 7 30 20 36 33 33 46 21 8 35 21 49 34 28 47 34 9 22 22 52 35 29 48 27 10 14 23 32 36 21 49 46 11 16 24 22 37 32 50 31 12 29 25 42 38 23 51 38 13 35 26 38 39 26 52 41

In the first figure below we replaced each observation in the series by the mean of that observation, the two observations immediately preceding it, and the two observations immediately following it. This gives a 5-point running mean with the smoothed line starting at week 3. The second figure below shows the effect of using a 9-point running mean. Note that each mean must be centred on the observation it is replacing, so only odd-numbers of points are used. The last of the series in the figure above shows the effect of taking 3 consecutive 3 point running means. Note that it produces a smoother result than either the 5-point of 9-point running mean.

{Fig. 14}   We next apply exponential smoothing to the same data. Each point is calculated as a weighted average of all preceding observations. Weighting was done in the following way.

For each observation we
1. multiplied the raw data point by a constant 'a' (where a<1),
2. multiplied the previous smoothed data point by '1-a', and
3. add them together, to give the new smoothed data point.
The constant 'a' is usually set to about 0.3. The butterfly data subjected to exponential smoothing using values of a = 0.1, 0.3 and 0.5 are shown in the figures below :

If a is close to zero, then greater weight is given to previous observations, which results in a smoother curve. As a is increased, then more weight is given to the raw data point, so the curve becomes more irregular and more closely follows the unsmoothed data.  Running means are still sensitive to outlying values, so if there are a few very large (or very small) values in the data set , it is better to use running medians. The effect of using a 5-point running median for the butterfly data is shown below:

Note that for these data running medians have a disadvantage in that they tend to look rather jagged. The second plot, above, demonstrates a way to get the advantages of both running medians and running means. Running medians are used for the initial smoothing, and then running means are taken of the smoothed data. Alternately, because a median may be considered an extreme form of trimmed mean, running trimmed means can be used - for example, omitting (zero weighting) the maximum and minimum of each set of 5 observations, and calculating the mean of the remaining three.

 Except where otherwise specified, all text and images on this page are copyright InfluentialPoints, all rights reserved. Images not copyright InfluentialPoints credit their source on web-pages attached via hypertext links from those images.