 InfluentialPoints.com
Biology, images, analysis, design...
 Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

# Quantiles and their display  ####  Worked example I

Our first worked example uses some hypothetical data based on the work of Rehbein & Visser (1999) on the number of Fasciola eggs per gram of faeces in sheep. sample eggs sample eggs sample eggs 1 306 8 85 15 75 2 152 9 245 16 77 3 136 10 227 18 43 4 113 11 99 18 211 5 128 12 324 19 301 6 72 13 785 20 80 7 455 14 220 21 354

Just looking at the data we can see it is likely to be skewed, with just a few sheep having large numbers of eggs in their faeces. This becomes clear when we look at a (grouped) bar diagram of the frequency distribution of these data:

Although the distribution is skewed, there is only one mode (at 1-100 eggs per sample) so a box-and-whisker plot is appropriate.

The first thing to do is to rank the data so we can determine the median and the lower and upper quartiles.

 rank eggs rank eggs rank eggs 1 43 8 113 15 245 2 72 9 128 16 301 3 75 10 136 17 306 4 77 11 152 18 324 5 80 12 211 19 354 6 85 13 220 20 455 7 99 14 227 21 785

We can then draw the box-and-whisker plot as below:

Now compare this with the bar diagram of the ranked frequency distribution (above). Note that half the animals have between 85 and 301 eggs per gram of faeces (shown by the box), whilst a few animals have very large numbers of eggs - which results in the very long upper whisker of the box.

#### Worked example II

Our second worked example uses the same cattle weight data as used in the More Information page on frequency distributions. We have rotated (and inverted) histograms of these data so they are directly comparable to the box-and-whisker plots.

The histogram of weights in herd A is unimodal and skewed towards the lower values - a left skew if the histogram were orientated in the usual way. The histogram of weights in herd B is bimodal and not noticeably skewed in either direction. The second graph shows what happens if we try to compare these distributions using box-and-whisker plots.

The box-and-whisker plot for herd A is readily interpretable as a skewed distribution with the lower whisker being longer than the upper whisker. But the box-and-whisker plot for herd B is very difficult to interpret. It has a wider interquartile range, and appears to be skewed towards higher values. The bimodality of this distribution cannot be interpreted from the box-and-whisker plot, so these plots should not be used for this sort of data. This is why one should always display data as jittered dot plots, rank scatterplots or histograms before using summary measures.

####  Worked example III

Our third worked example uses a quantile-quantile plot to compare the two sample distributions of cattle weights shown above.

1. Sort each distribution into rank order - this is done in the table below.

 Cattle weights in rank order Rank Herd A Herd B 12345 678910 1112131415 1617181920 2122232425 2627282930 420430430445450 460470475480485 490495495500505 510520520520530 530535535535540 545545545570570 420420420425430 430430430440450 460470475480490 495495500505520 520530530530535 540545545570570

2. Plot each value in one sample against the quantile with the same rank in the other sample.

3. Draw in a line of equality to indicate where the points would lie if the two distributions were identical.

The difference between the two distributions is immediately apparent, with the value in herd A generally being greater than its equivalent quantile in herd B. This difference may not be at all apparent from the box-and-whisker plot above. This sort of plot is much more time-consuming if sample sizes are not equal, since all the quantiles in the smaller sample would have to be interpolated. Fortunately you can do a quantile-quantile plot in R.

For more sophisticated quantile plots, such as rank scatterplots and p-value plots, see the More Information page on frequency distributions. 