If you cannot immediately see why these properties are useful to students, ask yourself which of these definitions is more useful.
If you feel the first definition is more appropriate at an 'elementary' or 'practical' level, please bear in mind that, with suitable off-the-shelf software and minimal training personal computers enable virtually anyone to calculate remarkably complex statistics. In which case, the only point in 'understanding statistics' is to interpret the results thereof.
In practice any useful 'understanding' must enable students to answer questions such as:
To illustrate the power of simulation modelling in understanding statistics it helps if we begin with a familiar statistic (such as the simple arithmetic mean) then build up our simulations from scratch using elementary R code, and see what emerges in the process.
An unavoidable first step in calculating a mean is to have some numbers to calculate it from. The following code assigns an arbitrary set of values to a variable called then reveals its contents, then selects all of those values, and calculates their mean.
The function gives the 'sample' standard deviation. So would give the standard deviation of Y.
Before going any further, please invest a few moments ensuring you are absolutely certain what those instructions to R were doing.
"There's no sense in being precise when you don't even know what you're talking about."
John von Neumann
If it is of help, there are several ways to graphically display how these sample means are distributed. But, to avoid becoming bogged down, we leave their interpretation to you.
Bearing in mind our definitions of the standard error of the mean, let us use one of our samples to compare the standard deviation of the 10000 means (obtained above ) with their standard deviation estimated using
Given all of which it should be blindingly obvious that the standard error formula must be assuming something which is missing from the simulation model above
Understanding sampling distributions is useful
At this point it might be a good idea to consider a highly-important term which is largely ignored in conventional elementary statistics courses: the sampling distribution - that being the distribution of a given statistic. The concept of sampling distributions is central to statistical tests and confidence intervals.
In this case the statistic is the mean of a sample, otherwise known as a sample mean. Clearly the standard error formula must be assuming how sample means can vary - and must assume that variation is predictable.
The most popular model by which variation is predicted requires you assume that variation is entirely random - in other words the variation of those means arises from 'sampling variation'. In which case the obvious recourse is to randomly select the values from which those means are calculated.
- Note: by 'random' we do not mean that you can just put in any results that you happen to feel like, or that you can select whatever results are most convenient! A random outcome means that one result should not be predictable from the previous ones (no sequences) and, in most cases that, of the available outcomes, each is equally likely (which is what you assume when you toss a coin, or roll a dice).
- Random selection and allocation form a central, absolutely crucial, part of statistical inference.
Since random selection is a common sort of requirement, R provides a function which does exactly that. Given an appropriate variable from which to select, the function returns a random selection of values thereof.
Since the order of values from which a mean is calculated ought not to affect the result, sampling every possible value, where each can only appear once, cannot produce any variation in those means.
- If that statement was not self-evident, try entering the following code!
Since random variation assumes you cannot predict successive outcomes, you might argue that selecting without replacement is not really 100% random. So let us see what happens if each value is replaced immediately after it is selected, in other words, we sample with replacement.
- Notice that sampling a finite set (of 9) values with replacement produces the same result as sampling an infinitely-large set of those same values (with or without replacement).
Now let us estimate how such variation would cause these sample means to vary.
> sd(m) # this is what we obtained
> sd(y)/sqrt(n) # the conventional estimate
To obtain a fairer comparison, let us do two things:
- Record the standard error estimated for each sample, of n (=9) values, using then use the most typical result (their median value) for comparison.
- Estimate the standard error using where s is the standard deviation of the entire set of (=90000) values - in other words a large random sample.
If you re-enter that code several times, it should be obvious that the standard deviation of sample means is much the same as the standard error estimated using when s is the standard deviation of a very large sample (90000 values). However, when the standard error is estimated from a small sample (just 9 values) tends to be rather too large - in other words it is a biased small-sample estimator. We return to this issue below.
Remember, the textbook formula is supposed to be estimating how random selection causes sample means to vary - not the other way round!
- Below are the result of 10 runs of that model, expressed as 3 rugplots, plus the estimated sampling distribution of these means.
- Notice the distribution of these sample means is neither smooth nor symmetrical about their median.
- This lack of symmetry makes the standard deviation of these means a rather poor measure of their variation.
- The standard deviation and mean are parameters of a 'normal distribution'.
- Specifying a mean +/- standard error is rather misleading when those means have a very non-normal sampling distribution - especially if that distribution is skewed...
Clearly the textbook standard error formula is not only assuming the values are sampled at random.
How realistic are these models?
"What men really want is not knowledge but certainty."
At this point it might be as well to pause, and consider what on earth these models are supposed to be simulating. Trying to understand the properties of sample means and standard deviations is all very well, but it is hard to imagine these simulation models (or statistical models) tell us anything useful about the real world. Mind you, thus far the same can be said of the values we calculated our statistics from - garbage in, garbage out, remember?
A moment's thought reveals that we are dealing with two very different issues:
An excellent reason for randomly selecting values is it enables us to avoid bias.
- How we selected the values.
- What other values could have been selected.
- If we ignore any other possible outcome, our conclusions can only apply to the particular results we obtained - which would make surveys and experiments somewhat pointless.
- If a statistical analysis ignores how the results were obtained, the results can be 100% misleading.
These conclusions are true of any statistical analysis, no matter how 'elementary'. Calculating statistics such as the mean and standard deviation may describe the results at hand, but are otherwise meaningless. Statistics such as standard errors do not merely describe a set of observed results, they enable you to infer something about how they might behave - assuming the samples could be repeated.
So where does the textbook standard error formula assume values are selected from, and how does it assume they are selected?
Given the importance of the normal distribution in conventional elementary statistics courses, you might assume that observations should be normally-distributed. Or more correctly, the values are 'randomly selected from a normally-distributed population'. (If you are new to statistics a population is any defined, fixed set of values, from which samples may be drawn).
Using R, it is easy to simulate such a sample, for instance as follows:
Notice that these simple instructions make some crucial, if non-obvious, assumptions:
- The sample size is set by us - in other word it is fixed, so cannot vary at random. In real surveys or experiments that assumption is not always true, but most analyses ignore the fact.
- The mean (=0) and standard deviation (=1) of that normal population is also fixed, if only by default.
- Because normal populations contain an infinite number of slightly different values, provided each value is measured accurately, the chance of selecting the same value twice is effectively zero. So it makes no difference whether each value is replaced immediately after selection.
Small sample behaviour
Now for an acid test. Let us:
Now we can compare the standard error of these means, estimated from each sample, with the standard deviation of those means obtained from samples. In doing so, remember the standard errors estimated from each sample and the standard deviation of their sample means are both estimates of the true standard error (where is infinite). So you may want to rerun that model instructions several times.
Applying the standard error formula to our population parameters tells us we should expect a standard error of 0.7071068, if we took the standard deviation of an infinite number of sample means.
Large sample behaviour
The following instructions are identical to those above except we have fixed the sample size at n=1000 instead of n=2.
Samples of non-normal data
"Half of wisdom is learning what to unlearn."
The Ringworld Throne (1996)
But does the textbook standard error of the mean formula assume samples are of a normal population? The following instructions are identical to those above except they sample a standard uniformly distributed population. (Every value we select is equally likely to lie anywhere between zero and one.)
If we do not know the formula for calculating the standard deviation of a uniform population, one simple solution is to take a very large sample thereof, and calculate its standard deviation. We then divide it by root n to get the standard error.
Clearly the value obtained by the textbook standard error-of-the-mean formula is just as applicable for (large) samples of a uniform population as it is for a large samples of a normal population.
Very non normal data
Uniformly and normally distributed data have some important properties in common:
- The distribution of values being sampled is a smoothly and symmetrically distributed.
- It is extremely unlikely a sample will contain two identical values.
Let us consider what happens if we take large samples of data which are noticeably skewed, and where each sample will contain many identical (tied) values. You may recall that sampling a finite set of values with replacement produces the same result as sampling an infinitely-large set of those values. You may also recall that, when small samples were used, the textbook standard error formula gave a biased estimate of the standard deviation of the sampling distribution of means, and that sampling distribution was anything but normal.
> summary(se) # summarize the estimates
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.7798 0.9196 0.9452 0.9446 0.9700 1.0650
> sd(m) # give the observed se
> # give the expected se
> sd(sample(Y, size=1000000, replace=TRUE))/sqrt(n)
The similarity of these estimates suggests that the textbook standard error formula is applicable for means of samples drawn from any given population, provided those samples are sufficiently large and obtained entirely randomly. In which case provides a quick and easy estimate.
However, as should now be increasingly apparent, there are some important practical difficulties with that nice simple answer. For instance:
- Real samples are often modest to small.
- Some populations may cause this simple formula to approach its large-sample behaviour more slowly than others.
- Standard errors of other statistics may behave very differently from those of simple means.
- Many real studies do not obtain their data by simple random sampling of a single fixed population of potential results.
Nevertheless, even with the tools we have provided on this page, most students should be able to envisage ways of investigating these issues - with very practical applications.
"Progress is impossible without change, and those who cannot change their minds cannot change anything."
George Bernard Shaw