InfluentialPoints.com
Biology, images, analysis, design...
Use/Abuse Stat.Book Beginners Stats & R
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

 

 

Beginners statistics: standard deviation

On this page: Example, with R,  Definition and Use,  Simple formula,  Tips and Notes,  Test yourself,  References  Download R  R is Free, very powerful, and does the boring calculations & graphs for scientists.

Example, with R

The standard deviation is the Root of the Mean Squared-deviation (or RMS deviation) from the mean - assuming your values contain the entire 'population' of interest. In other words it summarizes variation from their mean.

Given which the standard deviation of these 6 values equals 1:

1   -1   -1    1   -1    1
Notice:
the mean of 1 -1 -1 1 -1 1 is 0, so these values are also their deviations from that mean.
Their squared deviations are equally simple since -12 equals 12 equals 1
So the mean deviation is 1, and the square-root of 1 equals 1.

You could find this population standard deviation with 



Definition and Use

  1. The standard deviation of y is a measure of variation in y, and is usually the square root of y's variance.
  2. The root mean squared deviation from the mean is generally known as the 'population standard deviation'.
  3. The population standard deviation (along with the population mean) are all that are needed to define 'normal' populations - and their 'true' values are known as population parameters.
  4. But the population standard deviation formula, when applied to random samples of infinite populations, gives a biased estimate of that population's standard deviation - on average.
  5. For random samples a modified formula, known as the 'sample standard deviation', is used in which the sum of the squared deviations from the mean is divided by (n - 1) rather than n. This gives a less-biased estimate of that parameter.
  6. Nevertheless, the population standard deviation formula is the maximum-likelihood estimator.
  7. The sample standard deviation is often used (sometimes unwisely) to indicate variation of samples.


Simple formula

Assuming y contains n numbers, and that e (the deviation of each value from the mean) equals y - mean(y):

the 'population' standard deviation of y is squareroot(sum(e2)/n)

or squareroot(variance)

Note, sum(e2)/n is known as the mean squared error, or 'population variance'.

the 'sample' standard deviation of y is squareroot(sum(e2)/(n-1))


Tips and Notes

  • Owing to its importance in simple statistical models, standard deviations are exceedingly popular.
  • But summarizing data using its mean +/-sd is only meaningful if values are approximately normal, or at least more-or-less symmetrically distributed about the mean. When applied to strongly-skewed data, the standard deviation (sd) gives a misleading picture of sample variation!
  • For real data other measures of range, such as interquartile range(the upper and lower quartiles), make fewer assumptions, are more robust and are often much less misleading.


Test yourself

With R you can easily convert the results of sample standard deviation formulae to RMS deviation.


Useful references

Altman, D.G. & Bland, J.M. (2005). Standard deviations and standard errors BMJ 351, 903 (15 October). Full text 
Argues that the standard deviation is a valid measure of variability regardless of distribution.


Anderson, D.R. et al. (2001). Suggestions for presenting the results of data analyses. Journal of Wildlife Management 65 (3), 373-378. Full text 
Provides a number of helpful suggestions to wildlife biologists for presenting the results of data analyses, in particular the need to distinguish between standard deviation and standard error!


Stephen Gorard (2004). Revisiting a 90-year-old debate: the advantages of the mean deviation. Paper presented at the British Educational research Association Annual Conference, University of Manchester, 16-18 September, 2004.Full text 
Do we really have to use the standard deviation to quantify variation?


Lehmann, K.G. et al. (1996). Contributions of frequency distribution analysis to the understanding of coronary restenosis. A reappraisal of the Gaussian curve. Circulation 93 (6), 1123-1132. Full text 
Point out that whilst the mean and standard deviation are appropriate if a variable has a normal distribution, populations with skewed distributions cannot be adequately represented in this way.


Stark, P.B.. Measures of location and spread. University of California Full text 
A comprehensive and thought-provoking account of measures of location and spread.


Wikipedia Standard Deviation Full text 

See Also