Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
"Paired samples" are when observations are made on pairs of units which are similar in some respect. Usually one treatment is applied to one member of each pair and not to the other which serves as the control. Pairing (or matching as it is sometimes called) can be done on the basis of age, sex, behaviour or any other factor that might be expected to have an effect on the response variable. The purpose of pairing is to reduce the variability in the response variable that you are measuring. The more similar the two individuals are, the more effective the pairing.
The most effective form of pairing in self-pairing. Here a single individual or plot is measured on two occasions, one before and one after a particular treatment is applied. This is probably the most widely used type of pairing. Sometimes different treatments can be applied to two parts of the same individual - for example topical applications to skin problems or eye diseases.
The measurements that are analysed are not the individual readings for each individual, but the differences between the members of each pair. The differences are then tested against the null hypothesis of a mean difference of zero, using the t-distribution:
Confidence interval of the mean difference
The 95% normal approximation confidence interval for the mean difference is readily obtained multiplying the standard error of the mean difference by t:
This test assumes -
Bart et al. (1998) looked at how large a sample is required for a paired t- test using simulation. He found that for moderately skewed or bimodal populations, the sample size should exceed 10, whilst for highly skewed populations the sample size should exceed 20. However, these guidelines should be treated with caution, especially if you have extreme outliers or large numbers of zeros in the data.
Rather than relying on central limit theorem, it generally advisable to carry out a transformation on the raw data (not the differences) if you have reasonable grounds for believing that a particular transformation would be effective. For example, many biological data are right skewed and can be approximately normalized with a log transformation. But remember that, following a log transformation, when we detransform the mean difference we no longer get the difference between the arithmetic means but the ratio of the two geometric means.
Important assumptions are also made concerning any missing data points. Sometimes data are not available on one of the pairs, making it impossible to calculate the difference. The easiest way to deal with this problem is just to omit such pairs from the analysis. This is fine providing the missing differences are no larger or smaller than the others. But let us take an example of patients who are being treated with a drug that reduces blood pressure. Some patients may have to be taken off the drug before the second reading can be taken because of side effects related to high or low blood pressure. Omitting such data would clearly produce a biased result in terms of the overall effect on patients.