Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

Search this site



Measures of location

Arithmetic Mean

R can calculate the arithmetic mean of the cattle weight data using these instructions:


[1] 502.6667

  • [1] indicates this result is the first in a list of just one.

Estimated Mean

R can also estimate the mean from a set of frequencies and midpoints. Using the cattle weight data again, we can estimate the mean using these instructions:


[1] 500.5


Geometric Mean

R can calculate the geometric mean of the number of helminth eggs as follows:


[1] 34.0346

If there are any zeros in the data, the geometric mean can be calculated as follows:


[1] 19.96460


If you prefer using log to the base 10, to the natural logs used above, the geometric mean can b calculated as follows:


[1] 19.96460

Weighted Mean

R can calculate the weighted mean of the seroprevalence of BSDS in 11 herds (y) using a 'weighting' variable (f) as follows:


[1] 0.2109707


Or you can use R's weighted mean function:


[1] 0.2109707


Median, midrange and mode

R can calculate the median and midrange of the cattle weights using these two commands:


[1] 507.5

[1] 495

  • Other ways of obtaining the median include quantile(y, p = 0.5) (the 50% quantile of the set) and mean(y, trim = 0.5) (which returns the mean of y, after trimming the greatest 50% of values and the smallest 50% of its values).
  • quantile(y, c(0,0.25,0.5,0.75,1)) gives the minimum, lower quartile, median, upper quartile, and maximum of y.
  • mean(y, trim=0) or mean(y) give the un-trimmed, arithmetic mean, of y.


R can readily find which value has the maximum frequency (or frequencies) as follows, but for a number of very good reasons this is seldom done.


> dist$lengths[which(m)] # maximum observed frequency
[1] 3 3 3

> dist$values[which(m)] # value of mode
[1] 520 535 545

These data have 3 modes (at 520 535 545 kg) each of 3 observations, in other words these were the most common values. However, if we grouped these data as class intervals or smoothed the sample distribution in some way we should expect to find a different set of modes.

How many modes a smoothed distribution would have will, of course, depend upon how much it is smoothed - these data have a maximum of 3 modes, but can have no fewer than 1. Conversely if we had recorded the exact weights, instead of rounding them to the nearest whole number, we may well have found every weight was different - in which case the number of modes would equal the number of observations. Summarising which, modality is rather arbitrary.


Running means

R can calculate a (5-point) running mean from a set of data (such as the butterfly data), then add it to a plot of the unsmoothed data, using these instructions:


Exponential smoothing

Getting R to perform a weighted running mean, such as exponential smoothing, can be achieved similarly to a running mean.



Running medians

Getting R to calculate a running median is much more straightforward. If we ignore the input, and plotting instructions, a 3-point running median can be calculated using a single instruction: