InfluentialPoints.com
Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

 

 

Quantiles and their display

Worked example I

Our first worked example uses some hypothetical data based on the work of Rehbein & Visser (1999) on the number of Fasciola eggs per gram of faeces in sheep. Just looking at the data we can see it is likely to be skewed, with just a few sheep having large numbers of eggs in their faeces. This becomes clear when we look at a (grouped) bar diagram of the frequency distribution of these data:

sampleeggs sampleeggs sampleeggs
13068 851575
21529 2451677
313610 2271843
411311 9918211
512812 32419301
67213 7852080
745514 22021354

{Fig. 3}
vetreh01.gif

Although the distribution is skewed, there is only one mode (at 1-100 eggs per sample) so a box-and-whisker plot is appropriate. The first thing to do is to rank the data so we can determine the median and the lower and upper quartiles. We can then draw the box-and-whisker plot as below:

rankeggs rankeggs rankeggs
1438 11315245
2729 12816301
37510 13617306
47711 15218324
58012 21119354
68513 22020455
79914 22721785

{Fig. 4}
MIquan01.gif

Now compare this with the bar diagram of the ranked frequency distribution (above). Note that half the animals have between 85 and 301 eggs per gram of faeces (shown by the box), whilst a few animals have very large numbers of eggs - which results in the very long upper whisker of the box.

 

Worked example II

Our second worked example uses the same cattle weight data as used in the More Information page on frequency distributions. We have rotated (and inverted) histograms of these data so they are directly comparable to the box-and-whisker plots.

The histogram of weights in herd A is unimodal and skewed towards the lower values - a left skew if the histogram were orientated in the usual way. The histogram of weights in herd B is bimodal and not noticeably skewed in either direction. The second graph shows what happens if we try to compare these distributions using box-and-whisker plots.

{Fig. 5&6}
boxAB.gif

The box-and-whisker plot for herd A is readily interpretable as a skewed distribution with the lower whisker being longer than the upper whisker. But the box-and-whisker plot for herd B is very difficult to interpret. It has a wider interquartile range, and appears to be skewed towards higher values. The bimodality of this distribution cannot be interpreted from the box-and-whisker plot, so these plots should not be used for this sort of data. This is why one should always display data as jittered dot plots, rank scatterplots or histograms before using summary measures.

 

Worked example III

Our third worked example uses a quantile-quantile plot to compare the two sample distributions of cattle weights shown above.

  1. Sort each distribution into rank order - this is done in the table below.
  2. Plot each value in one sample against the quantile with the same rank in the other sample.
  3. Draw in a line of equality to indicate where the points would lie if the two distributions were identical.
Cattle weights
in rank order
RankHerd A Herd B
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
420
430
430
445
450
460
470
475
480
485
490
495
495
500
505
510
520
520
520
530
530
535
535
535
540
545
545
545
570
570
420
420
420
425
430
430
430
430
440
450
460
470
475
480
490
495
495
500
505
520
520
530
530
530
535
540
545
545
570
570

{Fig. 7}
qqplotAB.gif

The difference between the two distributions is immediately apparent, with the value in herd A generally being greater than its equivalent quantile in herd B. This difference may not be at all apparent from the box-and-whisker plot above.

This sort of plot is much more time-consuming if sample sizes are not equal, since all the quantiles in the smaller sample would have to be interpolated. Fortunately you can do a quantile-quantile plot in R.

For more sophisticated quantile plots, such as rank scatterplots and p-value plots, see the More Information page on frequency distributions.