InfluentialPoints.com
Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

Search this site

 

 

Quantiles and their display

Worked example I

Our first worked example uses some hypothetical data based on the work of Rehbein & Visser (1999) on the number of Fasciola eggs per gram of faeces in sheep.

sampleeggs sampleeggs sampleeggs
13068 851575
21529 2451677
313610 2271843
411311 9918211
512812 32419301
67213 7852080
745514 22021354

Just looking at the data we can see it is likely to be skewed, with just a few sheep having large numbers of eggs in their faeces. This becomes clear when we look at a (grouped) bar diagram of the frequency distribution of these data:

{Fig. 3}
vetreh01.gif

Although the distribution is skewed, there is only one mode (at 1-100 eggs per sample) so a box-and-whisker plot is appropriate.

The first thing to do is to rank the data so we can determine the median and the lower and upper quartiles.

rankeggs rankeggs rankeggs
1438 11315245
2729 12816301
37510 13617306
47711 15218324
58012 21119354
68513 22020455
79914 22721785

We can then draw the box-and-whisker plot as below:

{Fig. 4}
MIquan01.gif

Now compare this with the bar diagram of the ranked frequency distribution (above). Note that half the animals have between 85 and 301 eggs per gram of faeces (shown by the box), whilst a few animals have very large numbers of eggs - which results in the very long upper whisker of the box.

 

Worked example II

Our second worked example uses the same cattle weight data as used in the More Information page on frequency distributions. We have rotated (and inverted) histograms of these data so they are directly comparable to the box-and-whisker plots.

The histogram of weights in herd A is unimodal and skewed towards the lower values - a left skew if the histogram were orientated in the usual way. The histogram of weights in herd B is bimodal and not noticeably skewed in either direction. The second graph shows what happens if we try to compare these distributions using box-and-whisker plots.

{Fig. 5&6}
boxAB.gif

The box-and-whisker plot for herd A is readily interpretable as a skewed distribution with the lower whisker being longer than the upper whisker. But the box-and-whisker plot for herd B is very difficult to interpret. It has a wider interquartile range, and appears to be skewed towards higher values. The bimodality of this distribution cannot be interpreted from the box-and-whisker plot, so these plots should not be used for this sort of data. This is why one should always display data as jittered dot plots, rank scatterplots or histograms before using summary measures.

 

Worked example III

Our third worked example uses a quantile-quantile plot to compare the two sample distributions of cattle weights shown above.

  1. Sort each distribution into rank order - this is done in the table below.

    Cattle weights
    in rank order
    RankHerd A Herd B
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    420
    430
    430
    445
    450
    460
    470
    475
    480
    485
    490
    495
    495
    500
    505
    510
    520
    520
    520
    530
    530
    535
    535
    535
    540
    545
    545
    545
    570
    570
    420
    420
    420
    425
    430
    430
    430
    430
    440
    450
    460
    470
    475
    480
    490
    495
    495
    500
    505
    520
    520
    530
    530
    530
    535
    540
    545
    545
    570
    570

  2. Plot each value in one sample against the quantile with the same rank in the other sample.

  3. Draw in a line of equality to indicate where the points would lie if the two distributions were identical.

{Fig. 7}
qqplotAB.gif

The difference between the two distributions is immediately apparent, with the value in herd A generally being greater than its equivalent quantile in herd B. This difference may not be at all apparent from the box-and-whisker plot above.

This sort of plot is much more time-consuming if sample sizes are not equal, since all the quantiles in the smaller sample would have to be interpolated. Fortunately you can do a quantile-quantile plot in R.

For more sophisticated quantile plots, such as rank scatterplots and p-value plots, see the More Information page on frequency distributions.