InfluentialPoints.com
Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

Search this site

 

 

 

Arithmetic mean

Worked example

The table below shows the weights of thirty cattle.

#Weight #Weight #Weight
1445 11450 21475
2530 12500 22545
3540 13520 23420
4510 14460 24495
5570 15430 25485
6530 16520 26570
7545 17520 27480
8545 18430 28495
9505 19535 29470
10535 20535 30490

As you can see, the distribution was a little skewed to the left, but with only thirty observations this is not surprising.

{Fig. 4}
MIhist2.gif (was MIhist02.gif)

Using the individual observations, the arithmetic mean cattle weight

= 445 + 530 ....+ 470 + 490 / 30 = 15080 / 30 = 502.7 kg

The arithmetic mean can also be estimated from data grouped in a frequency distribution by assuming the values are concentrated at the centre of the interval. If the distribution is symmetrical, this assumption will be valid.

Frequency distribution
of weights of cattle

Weight Class (kg)Mid-pointFreq
401-420410.51
421-440430.52
441-460450.53
461-480470.53
481-500490.55
501-520510.55
521-540530.56
541-560550.53
561-580570.52

Using the grouped data, the arithmetic mean cattle weight

= {(410.5 x 1) ...+ (570.5 x 2)} / 30 = 500.5 kg.

Note that the value estimated from grouped data is close to, but not identical to, the value calculated from the raw data.

 

 

Geometric mean

Worked example

We will take as an example data on the number of helminth eggs in a small sample of cattle faeces.

Cow No.No. of
helminths
135
224
322
4267
515
621

The geometric mean (G) is calculated as follows:

G = Antilog { Σ (log Y) /n }
      = Antilog { (1.54 + 1.38 + 1.34 + 2.43 + 1.18 + 1.32) /6 }
      = Antilog 1.532 = 34.0

The geometric mean number of helminth eggs is 34, much lower than the arithmetic mean of 64. The geometric mean gives a more representative idea of central tendency, whilst the arithmetic mean is heavily influenced by one large number. If there are any zeros on the data, the conventional approach is to add one to each observation before taking logarithms, and then subtract one from the geometric mean after detransformation. Unfortunately , this process makes the (corrected) geometric mean a biased estimator - a fact we will return to later.

 

 

Weighted mean

Worked example

We will use the data from the BSDS story in Unit 1 to demonstrate calculation of a weighted mean. The table gives seroprevalence of BSDS in 11 herds, each with different numbers of cattle.

Table I:Infection of herds with BSDS
FarmNo. of cattle
(w)
No. +vePrevalence (%)
(Y)
Y.w
129700.000.0
212310.81301100
324500.000.0
47822.56410200
532000.000.0
614510.68966100
722400.000.0
826600.000.0
929800.000.0
1032000.000.0
115411.85185100
Σ23705-500

Before we obtained overall prevalence by dividing total number of infections (5) by the total number of animals (2370), and multiplying by 100 to give a prevalence of 0.211%.

But we get exactly the same answer if we instead calculate a weighted mean prevalence from the individual herd prevalences. In the last column we multiply each herd prevalence (Y) by the number of cattle in that herd (w). The sum of this column is 500. If we divide by the total number of cattle (Σw) (2370) we again get 0.211%.

 

 

Median, mode and mid-range

Worked example

We will use the same data on the weights of 30 cattle to work out the median, mode and mid-range.

Weights (kg) of cattle.

RankWeightRank WeightRank Weight
1 42011 49021 530
2 43012 49522 535
3 43013 49523 535
4 44514 50024 535
5 45015 50525 540
6 46016 51026545
7 47017 52027 545
8 47518 52028 545
9 48019 52029 570
10 48520 53030 570

To work out the median we first rank the weights of the 30 cattle from lowest to highest. The median is the centre-most value of the ranked data - in this case mid-way between the 15th and 16th value.

Thus we can estimate the median as 507.5 kg

If we were to use the raw data to calculate the mode, we would find three modes at 520, 535 and 545 kg. However the mode should normally be calculated from grouped frequency data as shown here:

With this grouping of the data the mode is at 531-550 kg, or 540.5 kg.

The mid-range is readily calculated as the value which is half way between the maximum and minimum, in this case (420+570)/2 = 495 kg.

Frequency distribution
of weights of cattle

Weight Class (kg)Frequency
411-4303
431-4502
451-4702
471-4904
491-5105
511-5305
531-5507
551-5702

To summarize:

Measures of location

Measure of locationWeight (kg)
Mid-Range 495
Arithmetic Mean 502.7
Median 507.5
Mode 540.5

 

 

Running means and medians

Worked example

The data shown here represents the number of a species of butterfly observed each week along a transect.

Number of a butterfly species
observed on a transect
WkNo.WkNo.WkNo.WkNo.
112143227314022
23015452827411
325163329124224.
415174830264327
512183531294421
625194432414529
730203633334621
835214934284734
922225235294827
1014233236214946
1116242237325031
1229254238235138
1335263839265241

In the first figure below we replaced each observation in the series by the mean of that observation, the two observations immediately preceding it, and the two observations immediately following it. This gives a 5-point running mean with the smoothed line starting at week 3. The second figure below shows the effect of using a 9-point running mean. Note that each mean must be centred on the observation it is replacing, so only odd-numbers of points are used. The last of the series in the figure above shows the effect of taking 3 consecutive 3 point running means. Note that it produces a smoother result than either the 5-point of 9-point running mean.

{Fig. 14}
MImoni09.gif

We next apply exponential smoothing to the same data. Each point is calculated as a weighted average of all preceding observations. Weighting was done in the following way.

    For each observation we
  1. multiplied the raw data point by a constant 'a' (where a<1),
  2. multiplied the previous smoothed data point by '1-a', and
  3. add them together, to give the new smoothed data point.
The constant 'a' is usually set to about 0.3. The butterfly data subjected to exponential smoothing using values of a = 0.1, 0.3 and 0.5 are shown in the figures below :

{Fig. 15}
MImoni14.gif

If a is close to zero, then greater weight is given to previous observations, which results in a smoother curve. As a is increased, then more weight is given to the raw data point, so the curve becomes more irregular and more closely follows the unsmoothed data.

Running means are still sensitive to outlying values, so if there are a few very large (or very small) values in the data set , it is better to use running medians. The effect of using a 5-point running median for the butterfly data is shown below:

{Fig. 16}
MImoni10.gif

Note that for these data running medians have a disadvantage in that they tend to look rather jagged. The second plot, above, demonstrates a way to get the advantages of both running medians and running means. Running medians are used for the initial smoothing, and then running means are taken of the smoothed data. Alternately, because a median may be considered an extreme form of trimmed mean, running trimmed means can be used - for example, omitting (zero weighting) the maximum and minimum of each set of 5 observations, and calculating the mean of the remaining three.