Biology, images, analysis, design...
Use/Abuse Stat.Book Beginners Stats & R
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)



Beginners statistics: cumulative plot

Example, with R,  Definition and Use,  Tips and Notes,  Test yourself,  References  Download R  R is Free, very powerful, and does the boring calculations & graphs for scientists.

Example, with R

Cumulative frequency plots can be done with histograms. Below are a frequency histogram and a cumulative frequency histogram of the same data.

Cumulative histograms are readily produced with 

  • Due to the heavy use of conventional histograms in elementary statistics, most statistical novices continue to employ them for the remainder of their career - and find cumulative plots difficult, if not impossible, to interpret.
  • So although cumulative distributions are very important, and cumulative distribution plots are very useful, they are seldom used by non-statisticians.

Definition and Use

  • Frequency histograms use each bar height to show the number of values in that interval.
  • Cumulative frequency histograms use each bar height to show the number of values in that interval, plus the number of values in all lower intervals.
  • Cumulative plots are especially useful because, once you can interpret them, they are a more robust way to examine distributions than histograms - especially when examining a small to moderate number of values.

Tips and Notes

  • Although cumulative frequency histograms have advantages over conventional frequency histograms, they still suffer from the general disadvantages of histograms - namely that class intervals are entirely arbitrary and can lead to bias.
  • Cumulative scatterplots, such as that shown below,  commonly plot each item's rank (or proportional rank) against its value. Provided each item has a different value, its rank is the number of items whose value is less than or equal to that item.
  • In which case the minimum value has a rank of 1, and the maximum has a rank of n (assuming there are n values) The graph below shows a cumulative scatterplot superimposed on a cumulative histogram of the same data.

  • Notice how, unlike the cumulative histogram, this scatterplot reveals the presence of 'tied' values.
  • In addition to this advantage, cumulative scatterplots are simpler to plot and are less artifact-prone than cumulative histograms.

The textarea below shows one way to produce a cumulative scatterplot with R.

  • plot(y, rank(y)) would give the same result, provided every value was different.
    By default R assumes the rank of tied values is their mean rank.
  • Cumulative scatterplots have a variety of names: a rank scatterplot, a plot of rank on value, a quantile plot, or an empirical cumulative distribution function (ECDF).

Test yourself

What does the R code below do, and in what ways might this be useful?

Hint: try using that code upon the data provided above.

Useful references

Chambers, J.M. et al. (1983). Graphical methods for data analysis. Wadsworth International Group/Duxbury Press, Belmont & Boston.
An excellent older text on graphical display of frequency distributions which deals with quantile plots (equivalent to rank scatterplots) in Chapter 2.

Wikipedia: Frequency distribution. Full text 
Includes a short section on cumulative frequency distributions.

See Also