 InfluentialPoints.com
Biology, images, analysis, design...
 Use/Abuse Stat.Book Beginners Stats & R
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

### Example, with R

Cumulative frequency plots can be done with histograms. Below are a frequency histogram and a cumulative frequency histogram of the same data.  Cumulative histograms are readily produced with • Due to the heavy use of conventional histograms in elementary statistics, most statistical novices continue to employ them for the remainder of their career - and find cumulative plots difficult, if not impossible, to interpret.
• So although cumulative distributions are very important, and cumulative distribution plots are very useful, they are seldom used by non-statisticians.

### Definition and Use

• Frequency histograms use each bar height to show the number of values in that interval.
• Cumulative frequency histograms use each bar height to show the number of values in that interval, plus the number of values in all lower intervals.
• Cumulative plots are especially useful because, once you can interpret them, they are a more robust way to examine distributions than histograms - especially when examining a small to moderate number of values.

### Tips and Notes

• Although cumulative frequency histograms have advantages over conventional frequency histograms, they still suffer from the general disadvantages of histograms - namely that class intervals are entirely arbitrary and can lead to bias.
• Cumulative scatterplots, such as that shown below, commonly plot each item's rank (or proportional rank) against its value. Provided each item has a different value, its rank is the number of items whose value is less than or equal to that item.
• In which case the minimum value has a rank of 1, and the maximum has a rank of n (assuming there are n values) The graph below shows a cumulative scatterplot superimposed on a cumulative histogram of the same data. • Notice how, unlike the cumulative histogram, this scatterplot reveals the presence of 'tied' values.
• In addition to this advantage, cumulative scatterplots are simpler to plot and are less artifact-prone than cumulative histograms.

The textarea below shows one way to produce a cumulative scatterplot with R.

• plot(y, rank(y)) would give the same result, provided every value was different.
By default R assumes the rank of tied values is their mean rank.
• Cumulative scatterplots have a variety of names: a rank scatterplot, a plot of rank on value, a quantile plot, or an empirical cumulative distribution function (ECDF).

### Test yourself

What does the R code below do, and in what ways might this be useful?

Hint: try using that code upon the data provided above.

### Useful references

Chambers, J.M. et al. (1983). Graphical methods for data analysis. Wadsworth International Group/Duxbury Press, Belmont & Boston.
An excellent older text on graphical display of frequency distributions which deals with quantile plots (equivalent to rank scatterplots) in Chapter 2.

Wikipedia: Frequency distribution. Full text Includes a short section on cumulative frequency distributions.