Definition
The standard error of the mean is the standard deviation of the sampling distribution of the mean. In other words it is the standard deviation of a large number of sample means of the same sample size drawn from the same population. The term standard error of the mean is commonly (though imprecisely) shortened to just standard error. Thus the terms 'standard error of the mean', 'standard deviation of the mean' and 'standard error' may all mean exactly the same thing!
Usually of course we only calculate one mean for a set of data, not multiple means. Hence, unlike the standard deviation of the observations, the standard error of the mean is estimated rather than measured. As such it is an inferential statistic rather than a descriptive statistic.
The standard error of the mean (SE ) is somewhat unusual in that there is a simple algebraic formula for it  and the formula is valid irrespective of the distribution of the data. It is equal to the population standard deviation (σ) divided by the square root of the number of observations in that sample. In practice we obtain an unbiased estimate of the standard error of a mean by dividing the sample standard deviation (s) by the square root of the number of observations in that sample. 
Algebraically speaking 
SE () 
= 
s 

√n 
Where:
 s is the sample standard deviation,
 n is the number of observations in the sample.

Hence the magnitude of the standard error of the mean depends on both the variability of the observations (s) and the number of observations (n). The larger the variability, the greater will be the standard error. The larger the number of observations, the smaller will be the standard error.
The standard deviation of the observations and the standard error of the mean are frequently confused.
So note:
 the standard deviation of the observations is used to describe the variation in a set of observations;
 the standard error of the mean is used to estimate the variation in a set of means.
We noted previously that the sample standard deviation (s) is a biased estimator of the population standard deviation (σ). So, if you intend to quote standard errors, and your sample sizes are small, you should use the corrected standard deviation in the formula.
The finite population correction
The formulae given above for estimating the standard error assume you are taking a sample from an infinite population. But if your sample comprises a large part of the population, the usual equation for the standard error will overestimate the standard error. Imagine you take several samples each comprising nearly all members of a population. Clearly there will not be much variability between sample means because the samples will mostly contain the same individuals. In the most extreme case, if your sample contained the entire population you would get the same value for the mean each time  in which case the standard error of the mean should be zero. Hence the need for a finite population correction.
If you look at the formula below, you will see that it reduces the standard error more and more as the sample size approaches the population size. If the sample size equals the population size, the standard error will be zero.
Algebraically speaking 
Where:
 SE () is the standard error of the sample mean,
 s is the sample standard deviation,
 n is the number of observations in the sample,
 N is the total population size,
 and (N − n) / N equals 1 − n/N
There is another slightly different version of the finite population correction in the literature which is given here.

In practice the finite population correction is usually only used if a sample comprises more than about 510% of the population. Even then it may not be applied if researchers wish to invoke the superpopulation concept', and apply their results to a larger, illdefined, population. This concept, whilst convenient for some, is highly controversial  partly because the problems of extending result to a superpopulation are exactly the same as when you are dealing with an ordinary population. In particular, you need to allow for variation and bias  which can be very difficult when a superpopulation is ill defined and the selection is not random! For further discussion on this see pp. 9799 in Bart et al. (1998) and p. 117 in Rothman & Greenland (1998).
Any statistic that can be computed  such as the variance, the coefficient of variation, or the median  also has a sampling distribution. Hence, any statistic has a standard error that can be used to describe its sampling variation. Even the standard deviation itself must exhibit variation in repeated samples, so it also has a standard error. However, many commonlyused statistics either do not have a simple formula to estimate their standard error, or (more commonly) the formula assumes your sample is very large, or your sample represents a particular type of population. The standard errors of some commonly used statistics are given above in related topics.
Assumptions and Requirements
The most important assumption in estimating the standard error of a mean is that the observations are equally likely to be obtained, and are independent. In other words the sample should have been obtained by random sampling or random allocation.
The standard error of the mean can be calculated for any type of variable, although it is only really appropriate for measurement variables (whether continuous or discrete) and binary variables.
It can be calculated for any type of frequency distribution  but like the standard deviation, for most of the things it is used for, the statistic is assumed to be distributed symmetrically.
The standard error of the mean is quoted very widely in the reporting of scientific data. It is a valid estimate of the variability of our estimate of the mean  but not of the variability of the observations.
Providing the sample size is reasonably large, and is a random sample from the population, it may also be used as an indirect measure of the reliability of our estimate of the mean. This is because the 95% confidence interval is roughly twice the standard error  providing the statistic is distributed normally. However (as we see in Unit 6,) that assumption is often not met. To be of any use, if only the standard error is given, sample sizes must be provided as well. If standard error bars are given without the sample size, the only inference you can make is that means with overlapping standard errors are definitely not significantly different. No inference can be made if bars do not overlap.
Related
topics :

Standard Error of the Median
Standard Error of the Coefficient of variation 
Standard Error of Kappa
Jackknifing
