InfluentialPoints.com
Biology, images, analysis, design...
 Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

The purpose of the z-test for independent proportions is to compare two independent proportions. It is also known as the t-test for independent proportions, and as the critical ratio test. In medical research the difference between proportions is commonly referred to as the risk difference.

The test statistic is the standardized normal deviate (z) as defined below. It's precise distribution depends on the sampling model. The critical values used for assessing the large-sample formula assumes two independent binomial samples, not a single multinomial one - as does its standard error formula. If the latter were true it would require only one sample size (n1+n2) instead of two (n1 AND n2). It is possible to construct exact models to estimate the distribution of a difference between proportions (or the standardised difference) for some other error distributions (including the multinomial) - but it would be unwise to assume their behaviour converges to the same distribution, even in the asymptote, and certainly not for smallish samples!

• Under the multinomial model only the total number of observations is fixed. In observational studies this is the case for an analytical survey where a single random cross sectional sample is taken and sampling units are classified according to two characteristics. In experimental studies the model is appropriate when a group of experimental units is allocated to treatments using simple randomization.
• Under the independent binomial model, either row or column totals are fixed but the other marginal totals are free to vary. Several observational designs use this model including the comparative area observational design, and cohort and case-control designs. In experimental studies the model is appropriate where restricted randomization is used to equalize group sizes for each treatment.

We give examples of these different designs in the More Information page for Pearson's chi square test.

The standard test uses the common pooled proportion to estimate the variance of the difference between two proportions. It is identical to the chi square test, except that we estimate the standard normal deviate (z). The square of the test statistic (z2) is identical to the Pearson's chi square statistic X2.

It is sometimes preferred to the chi square test if the interest is in the size of the difference between the two proportions. A confidence interval can be attached to that difference using either the normal approximation or a variety of exact or small sample methods.

 Important point In Unit 8 we analysed proportions in the situation where we had taken replicated samples from a population, calculated percentages from each sample, and then transformed the percentages so they could be handled as a normally distributed continuous variable. That is the correct approach for handling replicated proportions. We are dealing here with a quite different situation - namely where proportions are calculated either from a single random sample or from two independent samples. Under these circumstances, variability cannot be measured, but can only be estimated using the binomial distribution. The z-test should not be used for analysing replicated proportions.

### The formulae

The test statistic is obtained by dividing the difference between the proportions by the standard error of the difference. The most commonly used version of the test given here uses the best available estimate for the variance of the difference under the null hypothesis - that is the variance of the proportion derived from the combined samples:

#### Algebraically speaking -

 z = p1  −  p2 √ [ 1 + 1 ] n1 n2
Where:
• z is the z-statistic which is compared to the standard normal deviate,
• p1 and p2 are the two sample proportions,
• is the estimated true proportion under the null hypothesis equal to [n1p1 + n2p2]/(n1 + n2)
• is (1 − ),
• n1 and n2 are the number of observations in your two samples.

#### Correction for continuity

For small sample sizes, many statisticians feel that a correction for continuity should be applied. This is because a continuous distribution (the t-distribution) is being used to represent the discrete distribution of sample frequencies. The Yates correction to either formula is achieved by subtracting 1/2(1/n1 + 1/n2) from the modulus of the difference between the proportions. Hence for the usual form of the test:

#### Algebraically speaking -

 zcorr = |p1  −  p2|  −   (1/n1 + 1/n2)/2 √ [ 1 + 1 ] n1 n2
Where:
• zcorr is the Yates-corrected estimate of the z-statistic which is compared to the standard normal deviate,
• |p1  −  p2| is the modulus of the difference between p1 and p2,
• all other symbols are as above.

#### Confidence interval for difference between two proportions

If the null hypothesis has been rejected, then we can no longer use the estimate of the standard error of the difference based on the combined samples. Instead the standard error of the difference is obtained by taking the square root of the sum of the individual variances.

The normal approximation confidence interval of the difference is then obtained by multiplying the standard error by the standard normal deviate:

#### Algebraically speaking -

 SE (p1 − p2) = √ [ p1q1 + p2q2 ] n1 n2 95% CI (p1 − p2) = p1  −  p2 ± 1.96 SE
Where:
• all symbols are as above.

The continuity correction can be applied to this interval by subtracting 1/2 (1/n1 + 1/n2) from the lower limit and adding the same quantity to the upper limit.

Much as is the case with the normal approximation interval for a proportion, the normal approximation interval for the difference between proportions performs poorly.

As a result there are numerous suggestions for improved intervals, many of which we note in the 'Useful References' section. Disagreement on which is best is so widespread that it is difficult to know which one to recommend. R appears to use the Wilson score interval since they give a references for this method and to Newcomb's paper recommending its use.

Note: tests for the difference between paired proportions and the confidence interval of the difference are covered in the More Information page on the Binomial and related tests under McNemar's test.

#### An alternate critical ratio test

Because different estimates of the variance are used, it is possible that the results of the test may not be consistent with the confidence interval. In other words, the confidence interval of the difference may overlap zero (indicating no significant difference), yet the test indicates a significant difference. As a result an alternative critical ratio test was devised that gives identical results to the confidence interval. This estimates the standard error of the difference as the square of the sum of the individual variances:

#### Algebraically speaking -

 z = p1  −  p2 √ [ p1q1 + p2q2 ] n1 n2
Where:
• z is the z-statistic which is compared to the standard normal deviate,
• p1 and p2 are the two sample proportions,
• q1 is equal to 1 − p1 and q2 is 1 − p2,
• n1 and n2 are the number of observations in your two samples.

Fleiss et al. (2003) provide a discussion of the relative merits of this version of the test compared to the more usual version. When n1 = n2 this version is apparently more powerful than the traditional form , although it's type I error rate tends to be higher than the nominal 5%. When n1 ≠ n2, neither test is consistently more powerful. Fleiss concluded that there is no overwhelming reason for using the newer test rather than the traditional one. This issue is still debated (sometimes rather vituperatively) on the web.

### Assumptions

These are the same as those for Pearson's chi square test:

#### Sampling or allocation is random

The test is primarily intended for when a random sample is taken from each of two populations and each observation is classified into one of two different categories for a characteristic. An equivalent situation is if individuals in a group are allocated to receive one of two treatments using restricted randomization to equalize group sizes, and are then classified into one of two different categories

It can also be used if a single random sample is taken and each observation is classified into one of two different categories for each of two characteristics. An equivalent situation is if individuals in a group are (completely) randomly allocated to receive one of two treatments, and are then classified into one of two different categories.

#### Observations are independent

This is inherent in the first assumption. This assumption is not met if samples are obtained from clusters, or cluster randomization is used, and the test is then used to analyze results at an individual level. Nor is the test appropriate for comparing proportions derived from pooled samples.

#### Errors are normally distributed

Both models assume errors are normally distributed. Providing the cell frequencies are reasonably large, cell values in a 2 × 2 table will be distributed normally about their expected values. If any expected frequency is less than 5, or of if pqn is less than 5, then providing you want a conventional P-value, the continuity correction should be applied. Omission of the continuity correction will give you a mid-P-value. For more conservative criteria see

#### Mutual exclusivity

A given case may fall only in one class.