InfluentialPoints.com
Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

 

 

Properties

The purpose of the two sample t-test is to compare the means of two independent samples. These can be obtained either by random sampling from two populations (an observational design) or by random allocation to two treatment groups (an experimental design) - although that assumes the experimental group represents the wider population, that is seldom the case in reality.

The t-statistic is estimated as the difference between the two sample means, minus the difference between the true population means, divided by the estimated standard error of the difference between the sample means. For a null hypothesis of no difference, the difference between the true population means is zero. The standard error of the difference is usually estimated from the weighted variance about the means of the samples being compared.

For large sample sizes the t-distribution becomes equivalent to a 'standard' normal distribution with a mean of zero and a standard deviation of one. If the standard normal distribution is used to obtain critical values, the test is sometimes known as the z-test or occasionally the d-test. The t-distribution diverges from the normal distribution for small samples as it allows for the random error in estimating the variance.

There are two versions of the two-sample t-test. The standard version which we give first assumes that the two population variances are equal.

 

 

The equal-variance t-test

The general formula

Algebraically speaking -

t   =  
D − δ
sD
 
  =   (1 − 2) − (μ1 − μ2)
[ (n1 − 1)v1 + (n2 − 1)v2 ]( n1 + n2 )
 
n1 + n2 − 2 n1n2

where:

  • t is the t-statistic; under the null hypothesis t is a random quantile of the t-distribution with (n1 + n2 − 2) degrees of freedom,
  • D is the observed difference between sample means 1 and 2 of the two samples,
  • δ is the difference between the true population means,
  • sD is the estimated standard error of the difference between the means,
  • n1 & n2 are the number of observations in each sample,
  • v1 and v2 are the two sample variances.

 

For equal sample sizes

Where both samples have the same number of observations, the variance of the difference simplifies to (v1+v2)/n. Hence:

t  =        (1 − 2) − (μ1 − μ2)
[ v1+v2 ]
 
n
where:
  • t is the estimated t-statistic; under the null hypothesis it is a random quantile of the t-distribution with 2(n − 1) degrees of freedom,
  • v1 and v2 are the two sample variances
  • n is the number of observations in each sample
  • all other variables are as above.

 

For large sample sizes

If both samples are unequal, but large enough that (n−1) ≅ n, the variance of the difference simplifies to (v1/n2)+(v2/n1). Hence:

t  =   (1 − 2) −(μ1 − μ2)
[ v1 + v2 ]
 
n2n1
where:
  • t is the estimated t-statistic; under the null hypothesis it is a random quantile of the t-distribution with (n1 + n2 − 2) degrees of freedom,
  • v1 and v2 are the two sample variances
  • all other variables are as above.

Where the sample size and variance are expected to be very similar, the variance of the difference between observations is about double the variance of the observations themselves.

 

 

The unequal-variance t-test

If population variances cannot be assumed equal (following for example an F-ratio test), then we cannot use the standard t-test.

The first thing to try in this situation is a transformation of the data. If the variance is proportional to the mean, then you may well find the problem of non-equality of variances is resolved with a logarithmic transformation.

If not, then you have two options:

  • Try a different type of statistical test, for example a randomisation test or a non-parametric test.
  • Use the unequal variance t-test (also known as Welch's approximate t-test).

If the variances cannot be assumed equal, then the standard error of the difference between means is taken as the square root of the sum of the individual variances divided by their sample size:

Algebraically speaking -

t'   =   (1 − 2) − (μ1 − μ2)
( v1 + v2 )
 
n1n2
Where
  • t' is the unequal variance t-statistic for which critical values are determined as below,
  • μ1 − μ2 is the difference between your population means,
  • 12 is the difference between your sample means,
  • v1 and v2 are the sample variances,
  • n1 and n2 are the number of observations in 1 and 2

Having determined the t' value, we give two alternative ways to test its significance.

  1. Corrected degrees of freedom

    The estimated t' statistic can be tested against the standard t-distribution, but with reduced degrees of freedom. The appropriate number of degrees of freedom are given by the equation below:

    Algebraically speaking -

    df    =   [(v1/n1)  +  (v2/n2)]2
    (v1/n1)2   +   (v2/n2)2
     
    (n1 − 1)(n2 − 1)
    where
    • df (t') are the number of degrees of freedom for the unequal variance t-test,
    • all other variables are defined as above.

    We recommend this method as it enables you to determine the precise P-value for your test providing you have a probability calculator on your software package

     

  2. Corrected critical value

    The estimated t' statistic can also be tested against a different critical value which is calculated as a weighted average of the critical values of t based on the respective degrees of freedom of the two samples. The formula below shows how this works for a 1-tailed test.

    Algebraically speaking -

    t'α    =    tα (f1) (v1/n1)  +  tα (f2) (v2/n2)
    v1/n1  +  v2/n2
    where
    • tα is the critical value for a type I error of α,
    • For a 2-tailed test, tα (f1) should be changed to tα/2 (f1) when obtained from tables, and t1 −α/2 (f1) otherwise.
    • f1 and f2 are the degrees of freedom of the two samples,
    • all other variables are defined as above.

    Note that the unequal variance t-test is generally (but not always) more conservative than the standard t-test. Nevertheless some such as Gans (1991) feel that it should be used for all two sample tests instead of the equal variance formulation. This stems from the insensitivity of the F-ratio test in detecting differences in variances when populations are normal, and its excessive liberality with skewed populations.

 

 

The weighted t-test

Use of the above formulae gives equal weight to each observation. But if your sampling or experimental unit is a cluster, then the percentages or means may be based on different sample sizes. In that situation, those based on a larger sample size should carry more weight. This is achieved by using the formulae given in Unit 7 to calculate weighted means and variances for each group.

We have repeated the formulae for unequal cluster sizes here for convenience so they can be referred to when going through the worked example below.

Algebraically speaking -

Weighted mean (w) =   Σ(imi)
Σ(mi)

  Σ(mii2)    −   n w2
Weighted variance (s2w) =   
n − 1
where:

  • i indicates the ith value of the cluster means,
  • mi indicates the number of units (= weights) in each cluster,
  • is the average cluster size (Σmi/n),
  • n is the number of clusters,
  • w is the weighted mean.

The weighted means and variances are then used in place of the unweighted estimates in the appropriate formula for t.

 

 

Confidence interval of the difference between means

The 95% normal approximation confidence interval for the difference between the means is readily obtained by multiplying the standard error of the difference by t:

Algebraically speaking -

95% CI (D)   =   D t sD
Where:
  • D is the mean difference,
  • t is the (1 − α/2) quantile of the t-distribution with n1+n2 − 2 degrees of freedom, and α = 0.05,
  • n1 & n2 are the number of observations in each sample,
  • sD is the standard error of the difference.

Notice this interval assumes that estimates of D and sD are unrelated - in other words the differences are homoscedastic.

 

 

Assumptions

This test assumes -

  1. The means are of measurement variables
      Ranked or coded categorical observations, or variables derived from such data, should not be analysed using this test. With such data you should be asking if the mean is an appropriate measure of location - often the median would be a better choice. Replicated proportions can be analysed with the t-test providing they are appropriately transformed, for example using the arcsin square root transformation.
  2. Sampling (or allocation) is random and observations are independent
      Observations in a time series data should generally not be used as replicates as observations are not independent.
  3. The samples are drawn from normally distributed populations.
      This assumption is often relaxed under certain circumstances:
    • For large samples (above 300 observations) the means are close to normal, irrespective of how the observations are distributed.
    • For moderate samples (30-300 observations) the means should approximate a normal distribution. However, if the distributions are skewed, it is always preferable to apply a normalizing transformation.
    • For small samples (3-30 observations) distributions should be checked using qq or rankit plots. Where possible a normalizing transformation should be applied, although the efficacy of such a transformation may be difficult to assess with small data sets.
  4. Sample variances are homogenous (that is they represent the same population)
      Sample variances should be tested for homogeneity. Where sample variances are different, transformations should be checked in an attempt to homogenize variances. Achieving homogeneity should take precedence over achieving normality. If homogeneity cannot be achieved, use the 'approximate' unequal-variance t-test instead.
  5. The model is additive
      This assumption is required for the above to be true

Related
topics :

Equality of variance tests

Sample size