InfluentialPoints.com Biology, images, analysis, design... 

"It has long been an axiom of mine that the little things are infinitely the most important" 

Quantiles as summary statistics: Use and misuse(box and whisker plots, interquartile range, outliers, quantile quantile plots)Statistics courses, especially for biologists, assume formulae = understanding and teach how to do statistics, but largely ignore what those procedures assume, and how their results mislead when those assumptions are unreasonable. The resulting misuse is, shall we say, predictable... Use and MisuseQuantiles are underused by researchers in providing summary statistics, largely because of the addiction of most scientists to the arithmetic mean and the standard deviation. Thes latter statistics are used even when such measures are wholly inappropriate, such as with heavily skewed distributions. We give a number of examples of the correct use of quantiles and boxandwhisker plots, including several where the boxandwhisker plot is given alongside a dot plot  showing the full frequency distribution. Whilst boxandwhisker plots can be very useful for comparative purposes, this is only true if the full distribution has been examined first. This is especially true if the range indicates maximum and minimum, since it provides no information on an important part of the distribution  between the quartiles and the range. Use of the 95%, 90% or 80% reference range, or 1.5 times the interquartile range, is preferable, with outliers shown individually. It is, of course, important to label exactly what the 'range' or whiskers represent  without labelling, the figure can be highly misleading. A common problem is to find a mismatch between the statistics displayed (whether as a boxandwhisker plot or a range plot), and the reported statistical analysis. If it is the difference between arithmetic means that is being tested, then means should appear in the figure or table. If values are log transformed, then the geometric mean is the appropriate statistic. If one of the nonparametric tests is used, then the median is usually the most appropriate statistic. We have given a few examples from the literature,of the use of quantilequantile plots to assess distributions but it has to be admitted that they are still very rare! This is despite their advocacy in two well known texts on graphical display of data. Even where they are used, there is still a tendency to just condense the information to a simple summary statistic (such as the difference between the plot and the y = x line), rather than interpret them to bring out differences between the distributions. What the statisticians sayChambers et al. (1983) and Cleveland (1989) (1993) provide excellent coverage of boxandwhisker plots, quantile scatterplots and quantilequantile plots. Woodward (1999) has a good section on descriptive techniques for quantitative variables which includes quantiles, the five quantile summary and boxandwhisker plots. Griffiths et al. (1998) give a good account of the use of quantiles, boxandwhisker plots and the five quantile summary for summarizing data.Altman & Bland (1996) provide a useful summary of the properties and uses of quantiles in medical research whilst Frigge et al. (1989) detail the uses of boxandwhisker plots for exploratory data analysis. Wartenberg & Northridge (1991) advocate the use of quantilequantile plots in casecontrol studies for exploring and comparing the distribution of exposure among cases and controls. Wikipedia has sections on quantiles , boxandwhiskerplots and quantilequantile plots . NetMBA provides a short section on interpretation of boxplots. NIST/SEMATECH provides a good account of quantilequantile plots.
