InfluentialPoints.com Biology, images, analysis, design... 

"It has long been an axiom of mine that the little things are infinitely the most important" 

The paired ttest: Use & misuse(beforeafter studies, confounding factors, pseudoreplication, convenience sampling, normality of the mean difference)Statistics courses, especially for biologists, assume formulae = understanding and teach how to do statistics, but largely ignore what those procedures assume, and how their results mislead when those assumptions are unreasonable. The resulting misuse is, shall we say, predictable...
Use and MisuseThe paired ttest is widely used in all disciplines. We include a few examples of its correct use  and rather more of its misuse. An example of the latter is to use the paired ttest to test for equality between two measurement techniques. The test may tell you whether the means are significantly different  but it cannot test for equality! Another common misuse is to set up a parallel group randomized trial, yet test for a treatment effect by the analysis using paired ttests on a beforetreatment aftertreatment basis. This converts a (strong inference) randomized trial to a (weak inference) observational study. Much better to correct for differences in baseline values between treatment groups in other ways. Use of the paired ttest for observational beforeafter studies has other pitfalls, including confounding variables and regression to the mean because of measurement error.Inadequate sample size and consequent low power are common problems. This is often 'resolved' by pseudoreplication  for example, using corresponding quarters of the pre and postintervention years rather than a single measure for each village; using the same participants to do multiple before and after studies; and using repeated measures over time as paired replicates. Perhaps a more justifiable error is to use paired samples when the pairing is unjustified. If insufficient variation is accounted for one simply looses power as a result, especially with small sample sizes. The other downside of pairing is the risk of contamination between units. We come last to the specific requirements for the paired ttest. The first is that observations are obtained either by probability sampling or random allocation. Comparison of two groups of conveniencesampled units is meaningless, yet still widely done. Fortunately, random allocation is becoming more common in experimental work. As for normality of the mean difference, it is true the ttest is fairly robust on this except when there are large numbers of zeros. But the 'robustness' of the ttest is often pushed beyond all reasonable limits. If cluster sampling or cluster randomization is being used, it is important to use the correct weighted standard error in the ttest if there are variable number of units in each cluster. Lastly there is the practice of carrying out multiple comparisons using ttests. The ttest should only be used for pairwise comparisons, with other approaches used for multiple comparisons. What the statisticians sayArmitage & Berry (2002) cover the paired ttest in Chapter 4. Bart et al. (1998) provide a useful account of the analysis of paired data and partially paired data in Chapter 3, along with an assessment of how large a sample size is needed for skewed distributions to be normalized. Zar (1999) covers paired designs in Chapter 9. Use of the paired ttest is discussed, along with a test for difference between variances from two correlated populations (although note that the paired ttest does not assume equality of variances). Underwood (1997) covers paired comparisons in Chapter 6 and emphasizes the observational nature of beforeafter studies.Wright (2006) examines why the paired t test and ANCOVA can produce different results when comparing groups in a beforeafter design, the socalled Lord's paradox. Tuet al. (2008) and Wainer (1991) give more on Lord's paradox. Menke & Martinez (2004) describe how a permutation test can be used to provide an exact test for the two sample paired situation, rather than Student's tdistribution. Zimmerman (2005) notes that power in pairedsamples designs can be improved by correcting the twosample test for correlation rather than using a paired ttest. Zimmerman (1997), (2004) stresses the importance of taking nonindependence of samples into account even when the correlation is small. Box (1987) provides a fascinating account of the development of Student's ttest by W.S. Gosset working in the Guinness brewery. Bennett et al. (2002) review the use of both the paired and unpaired ttest in cluster randomized designs with reference to previous papers on the topic. These include Klar & Donner (1997) who advocate the use of stratified designs with more than two clusters in each stratum and Diehr et al. (1995) who advocate performing an unpaired analysis on paired data. Other contributions on the topic include Donner (1987) and Donner & Donald (1982). DiazUriarte (2002) examines the incorrect use of the paired ttest to analyze crossover trials in animal behaviour research. Burridge & Robins (2000) compare paired designs (analysed with the paired ttest) with the Latin square design for assessing the performance of bycatch reduction devices in fisheries research. Arthur et al. (1996) looks at use of the paired ttest for assessing habitat selection when availability changes whilst Horton (1995) reviews use of the paired ttest for analyzing pairedchoice assays. Wikipedia has sections on the paired difference test and Student's ttest. NIST/SEMATECH eHandbook of Statistics covers analysis of paired observations. Graphpad have a useful section on interpreting the paired ttest.
