Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
z-test for independent proportionsOn this page: Properties & Assumptions Formulae Correction for continuity Confidence interval for difference between two proportions An alternate critical ratio test Assumptions
The purpose of the z-test for independent proportions is to compare two independent proportions. It is also known as the t-test for independent proportions, and as the critical ratio test. In medical research the difference between proportions is commonly referred to as the risk difference.
The test statistic is the standardized normal deviate (z) as defined below. It's precise distribution depends on the sampling model. The critical values used for assessing the large-sample formula assumes two independent binomial samples, not a single multinomial one - as does its standard error formula. If the latter were true it would require only one sample size (n1+n2) instead of two (n1 AND n2). It is possible to construct exact models to estimate the distribution of a difference between proportions (or the standardised difference) for some other error distributions (including the multinomial) - but it would be unwise to assume their behaviour converges to the same distribution, even in the asymptote, and certainly not for smallish samples!
We give examples of these different designs in the More Information page for Pearson's chi square test.
The standard test uses the common pooled proportion to estimate the variance of the difference between two proportions. It is identical to the chi square test, except that we estimate the standard normal deviate (z). The square of the test statistic (z2) is identical to the Pearson's chi square statistic X2.
It is sometimes preferred to the chi square test if the interest is in the size of the difference between the two proportions. A confidence interval can be attached to that difference using either the normal approximation or a variety of exact or small sample methods.
The test statistic is obtained by dividing the difference between the proportions by the standard error of the difference. The most commonly used version of the test given here uses the best available estimate for the variance of the difference under the null hypothesis - that is the variance of the proportion derived from the combined samples:
Correction for continuity
For small sample sizes, many
Confidence interval for difference between two proportions
If the null hypothesis has been rejected, then we can no longer use the estimate of the standard error of the difference based on the combined samples. Instead the standard error of the difference is obtained by taking the square root of the sum of the individual variances.
The normal approximation confidence interval of the difference is then obtained by multiplying the standard error by the standard normal deviate:
Much as is the case with the normal approximation interval for a proportion, the normal approximation interval for the difference between proportions performs poorly.
As a result there are numerous suggestions for improved intervals, many of which we note in the 'Useful References' section. Disagreement on which is best is so widespread that it is difficult to know which one to recommend. R appears to use the Wilson score
Note: tests for the difference between paired proportions and the confidence interval of the difference are covered in the More Information page on the Binomial and related tests under McNemar's
An alternate critical ratio test
Because different estimates of the variance are used, it is possible that the results of the test may not be consistent with the confidence interval. In other words, the confidence interval of the difference may overlap zero (indicating no significant difference), yet the test indicates a significant difference. As a result an alternative critical ratio test was devised that gives identical results to the confidence interval. This estimates the standard error of the difference as the square of the sum of the individual variances:
Fleiss et al.
Sampling or allocation is random
The test is primarily intended for when a random sample is taken from each of two populations and each observation is classified into one of two different categories for a characteristic. An equivalent situation is if individuals in a group are allocated to receive one of two treatments using restricted randomization to equalize group sizes, and are then classified into one of two different categories
It can also be used if a single random sample is taken and each observation is classified into one of two different categories for each of two characteristics. An equivalent situation is if individuals in a group are (completely) randomly allocated to receive one of two treatments, and are then classified into one of two different categories.
Observations are independent
This is inherent in the first assumption. This assumption is not met if samples are obtained from clusters, or cluster randomization is used, and the test is then used to analyze results at an individual level. Nor is the test appropriate for comparing proportions derived from pooled
Errors are normally distributed
Both models assume errors are normally distributed. Providing the cell frequencies are reasonably large, cell values in a 2 × 2 table will be distributed normally about their expected values. If any expected frequency is less than 5, or of if pqn is less than 5, then providing you want a conventional P-value, the continuity correction should be applied. Omission of the continuity correction will give you a mid-P-value. For more conservative criteria
A given case may fall only in one class.