Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
Quantiles and their display
Just looking at the data we can see it is likely to be skewed, with just a few sheep having large numbers of eggs in their faeces. This becomes clear when we look at a (grouped) bar diagram of the frequency distribution of these data:
Although the distribution is skewed, there is only one mode (at 1-100 eggs per sample) so a box-and-whisker plot is appropriate.
The first thing to do is to rank the data so we can determine the median and the lower and upper quartiles.
We can then draw the box-and-whisker plot as below:
Now compare this with the bar diagram of the ranked frequency distribution (above). Note that half the animals have between 85 and 301 eggs per gram of faeces (shown by the box), whilst a few animals have very large numbers of eggs - which results in the very long upper whisker of the box.
Worked example II
Our second worked example uses the same cattle weight data as used in the More Information page on frequency distributions. We have rotated (and inverted) histograms of these data so they are directly comparable to the box-and-whisker plots.
The histogram of weights in herd A is unimodal and skewed towards the lower values - a left skew if the histogram were orientated in the usual way. The histogram of weights in herd B is bimodal and not noticeably skewed in either direction. The second graph shows what happens if we try to compare these distributions using box-and-whisker plots.
The box-and-whisker plot for herd A is readily interpretable as a skewed distribution with the lower whisker being longer than the upper whisker. But the box-and-whisker plot for herd B is very difficult to interpret. It has a wider interquartile range, and appears to be skewed towards higher values. The bimodality of this distribution cannot be interpreted from the box-and-whisker plot, so these plots should not be used for this sort of data. This is why one should always display data as jittered dot plots, rank scatterplots or histograms before using summary measures.
Our third worked example uses a quantile-quantile plot to compare the two sample distributions of cattle weights shown above.
The difference between the two distributions is immediately apparent, with the value in herd A generally being greater than its equivalent quantile in herd B. This difference may not be at all apparent from the box-and-whisker plot
This sort of plot is much more time-consuming if sample sizes are not equal, since all the quantiles in the smaller sample would have to be interpolated. Fortunately you can do a quantile-quantile plot in R.