 InfluentialPoints.com
Biology, images, analysis, design...
 Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

## Sample variance and Standard Deviation using R

### Variance and SD

R can calculate the sample variance and sample standard deviation of our cattle weight data using these instructions:

Giving: > var(y)  1713.333 > sd(y)  41.39243 Note:
• var(y) instructs R to calculate the sample variance of Y. In other words it uses n-1 'degrees of freedom', where n is the number of observations in Y.

• sd(y) instructs R to return the sample standard deviation of y, using n-1 degrees of freedom.

• sd(y) = sqrt(var(y)). In other words, this is the uncorrected sample standard deviation.

• This var function cannot give the 'population variance', which has n not n-1 d.f. But, there are 2 simple ways to achieve that:

• Remember if n=1 the second variance formula will always yield zero, because the mean of y will equal y, whereas the first formula will always yield NA, because 0/(1-1) = 0/0 and cannot be evaluated.

• Similarly, to obtain the 'population' standard deviation, use:

### Variance from frequencies and midpoints

R can calculate the variance from the frequencies (f) of a frequency distribution with class midpoints (y) using these instructions:

Giving: 143.8768 Note:
• y=c(110, 125, 135, 155) copies the class interval midpoints into a variable called y.

• f=c(23, 15, 6, 2) copies the frequency of each class into a variable called f.

• ybar=sum(y*f)/sum(f) creates a variable called ybar, containing the arithmetic mean - as calculated from these frequencies and midpoints.

However, even if you have a more accurate arithmetic mean, calculated directly from the observations themselves, you need to use this formula. If you do not do this your estimated variance will be too high - because this formula gives the mean based upon the same assumptions as your variance will be calculated.

• sum(f*(y-ybar)^2) / (sum(f)-1) calculates the sample variance from the frequencies, f, midpoints, y, and the mean estimated from them, ybar.

Alternately, you could combine two of these instructions as: sum(f*(y-sum(y*f)/sum(f))^2)/(sum(f)-1)

• Remember this only provides an estimate of the variance you would obtain from the original data - and is dependent upon the choice of midpoints, and the number of class intervals used.