Properties
The purpose of the two sample ttest is to compare the means of two independent samples. These can be obtained either by random sampling from two populations (an observational design) or by random allocation to two treatment groups (an experimental design)  although that assumes the experimental group represents the wider population, that is seldom the case in reality.
The tstatistic is estimated as the difference between the two sample means, minus the difference between the true population means, divided by the estimated standard error of the difference between the sample means. For a null hypothesis of no difference, the difference between the true population means is zero. The standard error of the difference is usually estimated from the weighted variance about the means of the samples being compared.
For large sample sizes the tdistribution becomes equivalent to a 'standard' normal distribution with a mean of zero and a standard deviation of one. If the standard normal distribution is used to obtain critical values, the test is sometimes known as the ztest or occasionally the dtest. The tdistribution diverges from the normal distribution for small samples as it allows for the random error in estimating the variance.
There are two versions of the twosample ttest. The standard version which we give first assumes that the two population variances are equal.
The equalvariance ttest
The general formula
For equal sample sizes
Where both samples have the same number of observations, the variance of the difference simplifies to (v_{1}+v_{2})/n. Hence:
t =
 (_{1} − _{2}) − (μ_{1} − μ_{2})


√
 
[
 v_{1}+v_{2}
 ]

 
n
  
where:
 t is the estimated tstatistic; under the null hypothesis it is a random quantile of the tdistribution with 2(n − 1) degrees of freedom,
 v_{1} and v_{2} are the two sample variances
 n is the number of observations in each sample
 all other variables are as above.

For large sample sizes
If both samples are unequal, but large enough that (n−1) ≅ n, the variance of the difference simplifies to (v_{1}/n_{2})+(v_{2}/n_{1}). Hence:
t =
 (_{1} − _{2}) −(μ_{1} − μ_{2})


√
 
[
 v_{1}  +  v_{2}
 ]

  
n_{2}  n_{1}
  
where:
 t is the estimated tstatistic; under the null hypothesis it is a random quantile of the tdistribution with (n_{1} + n_{2} − 2) degrees of freedom,
 v_{1} and v_{2} are the two sample variances
 all other variables are as above.

Where the sample size and variance are expected to be very similar, the variance of the difference between observations is about double the variance of the observations themselves.
The unequalvariance ttest
If population variances cannot be assumed equal (following for example an Fratio test), then we cannot use the standard ttest.
The first thing to try in this situation is a transformation of the data. If the variance is proportional to the mean, then you may well find the problem of nonequality of variances is resolved with a logarithmic transformation.
If not, then you have two options:
 Try a different type of statistical test, for example a randomisation test or a nonparametric test.
 Use the unequal variance ttest (also known as Welch's approximate ttest).
If the variances cannot be assumed equal, then the standard error of the difference between means is taken as the square root of the sum of the individual variances divided by their sample size:
Algebraically speaking 
t' =
 (_{1} − _{2}) − (μ_{1} − μ_{2})


√
 
(
 v_{1}  +  v_{2}
 )

  
n_{1}  n_{2}
  
Where
 t' is the unequal variance tstatistic for which critical values are determined as below,
 μ_{1} − μ_{2} is the difference between your population means,
 _{1} − _{2} is the difference between your sample means,
 v_{1} and v_{2} are the sample variances,
 n_{1} and n_{2} are the number of observations in _{1} and _{2}

Having determined the t' value, we give two alternative ways to test its significance.
Corrected degrees of freedom
The estimated t' statistic can be tested against the standard tdistribution, but with reduced degrees of freedom. The appropriate number of degrees of freedom are given by the equation below:
Algebraically speaking 
df =
 [(v_{1}/n_{1}) + (v_{2}/n_{2})]^{2}


(v_{1}/n_{1})^{2}
 +
 (v_{2}/n_{2})^{2}

  
(n_{1} − 1)  (n_{2} − 1)

 
where
 df (t') are the number of degrees of freedom for the unequal variance ttest,
 all other variables are defined as above.

We recommend this method as it enables you to determine the precise Pvalue for your test providing you have a probability calculator on your software package
Corrected critical value
The estimated t' statistic can also be tested against a different critical value which is calculated as a weighted average of the critical values of t based on the respective degrees of freedom of the two samples. The formula below shows how this works for a 1tailed test.
Algebraically speaking 
t'_{α} =
 t_{α (f1) }(v_{1}/n_{1}) + t_{α (f2)} (v_{2}/n_{2})


v_{1}/n_{1} + v_{2}/n_{2
} 
 
where
 t_{α} is the critical value for a type I error of α,
 For a 2tailed test, t_{α (f1)} should be changed to t_{α/2 (f1)} when obtained from tables, and t_{1 −α/2 (f1)} otherwise.
 f1 and f2 are the degrees of freedom of the two samples,
 all other variables are defined as above.

Note that the unequal variance ttest is generally (but not always) more conservative than the standard ttest. Nevertheless some such as Gans (1991) feel that it should be used for all two sample tests instead of the equal variance formulation. This stems from the insensitivity of the Fratio test in detecting differences in variances when populations are normal, and its excessive liberality with skewed populations.
The weighted ttest
Use of the above formulae gives equal weight to each observation. But if your sampling or experimental unit is a cluster, then the percentages or means may be based on different sample sizes. In that situation, those based on a larger sample size should carry more weight. This is achieved by using the formulae given in Unit 7 to calculate weighted means and variances for each group.
We have repeated the formulae for unequal cluster sizes here for convenience so they can be referred to when going through the worked example below.
Algebraically speaking 
Weighted mean (_{w}) =
 Σ(_{i}×m_{i})


Σ(m_{i})

 
where:
 _{i} indicates the ith value of the cluster means,
 m_{i} indicates the number of units (= weights) in each cluster,
 is the average cluster size (Σm_{i}/n),
 n is the number of clusters,
 _{w} is the weighted mean.

The weighted means and variances are then used in place of the unweighted estimates in the appropriate formula for t.
Confidence interval of the difference between means
The 95% normal approximation confidence interval for the difference between the means is readily obtained by multiplying the standard error of the difference by t:
Algebraically speaking 
95% CI (_{D}) 
= 
_{D} ± t × s_{D} 
 
Where:
 _{D} is the mean difference,
 t is the (1 − α/2) quantile of the tdistribution with n_{1}+n_{2} − 2 degrees of freedom, and α = 0.05,
 n_{1} & n_{2} are the number of observations in each sample,
 s_{D} is the standard error of the difference.

Notice this interval assumes that estimates of _{D} and s_{D} are unrelated  in other words the differences are homoscedastic.
Assumptions
This test assumes 
 The means are of measurement variables
Ranked or coded categorical observations, or variables derived from such data, should not be analysed using this test. With such data you should be asking if the mean is an appropriate measure of location  often the median would be a better choice. Replicated proportions can be analysed with the ttest providing they are appropriately transformed, for example using the arcsin square root transformation.
 Sampling (or allocation) is random and observations are independent
Observations in a time series data should generally not be used as replicates as observations are not independent.
 The samples are drawn from normally distributed populations.
This assumption is often relaxed under certain circumstances:
 For large samples (above 300 observations) the means are close to normal, irrespective of how the observations are distributed.
 For moderate samples (30300 observations) the means should approximate a normal distribution. However, if the distributions are skewed, it is always preferable to apply a normalizing transformation.
 For small samples (330 observations) distributions should be checked using qq or rankit plots. Where possible a normalizing transformation should be applied, although the efficacy of such a transformation may be difficult to assess with small data sets.
 Sample variances are homogenous (that is they represent the same population)
Sample variances should be tested for homogeneity. Where sample variances are different, transformations should be checked in an attempt to homogenize variances. Achieving homogeneity should take precedence over achieving normality. If homogeneity cannot be achieved, use the 'approximate' unequalvariance ttest instead.
 The model is additive
This assumption is required for the above to be true
Related
topics :
Equality of variance tests
Sample size