InfluentialPoints.com
Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

Search this site

 

 

Quantiles & boxplots using R

Quantiles and similar measures

R has a variety of functions that produce quantile-related statistics, most of which are self-explanatory. With a little thought it is possible to produce various other useful measures.

Giving:

> median(y) [1] 152 > min(y) [1] 43 > max(y) [1] 785 > range(y) # min & max [1] 43 785 > IQR(y) # interquartile range [1] 216 > summary(y) # range, quartiles, & mean Min. 1st Qu. Median Mean 3rd Qu. Max. 43.0 85.0 152.0 213.7 301.0 785.0 > quantile(y,0.1) # first decile (p=0.1) 10% 75 > quantile(y,c(0.05,0.95)) # 90% range 5% 95% 72 455 > rank(y) [1] 17 11 10 8 9 2 20 6 15 14 7 18 21 13 3 4 1 12 [19] 16 5 19 > rank(y)/length(y) # relative rank [1] 0.80952381 0.52380952 0.47619048 0.38095238 0.42857143 [6] 0.09523810 0.95238095 0.28571429 0.71428571 0.66666667 [11] 0.33333333 0.85714286 1.00000000 0.61904762 0.14285714 [16] 0.19047619 0.04761905 0.57142857 0.76190476 0.23809524 [21] 0.90476190 > # proportion greater than 101 > sum(y > 101)/length(y) [1] 0.6666667

    Note:
  • By default, if not every value of y is different, the rank function gives their mean rank.
  • The last two instructions assume y has no missing values (NA).
 

Boxplots

Note, by default boxplots are numbered in the order you provide their data.

Quantile quantile plots

A useful way to compare the distributions of two sets of values is to plot the quantiles of one set against the corresponding quantiles of the second set. When both sets have the same number of values the easiest way is to plot a scatterplot of their values, arranged in rank-order - but, if they have an unequal number of values, the qqplot function will interpolate as required.

    Note,
  • The two plots (above) look the same because their variables (x & y) do not contain any tied values.

  • Plotting rank (or relative rank) on value reveals the cumulative distribution, and is known as a rank scatterplot.

  • Plotting observed quantiles against their expected (theoretical) values also produces what is known as a QQ plot (e.g. see Unit 3) - and is more commonly done than the 2-sample qq plot described here.