Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

Search this site




Multiple 22 tables
& Mantel-Haenszel methods

The chi square test, and other tests for comparing proportions covered within this unit, are frequently misapplied by analysing pooled 22 tables as if they were derived from one study. Pooling data where the prevalence of the characteristic varies between replicates can quite simply give the wrong answer - known as Simpson's paradox. It is best understood by looking at these hypothetical data sets.

For example:

Example 1

Note the fall rates for the two treatments A and B were very similar in each centre, but differed between centres. In the first centre the fall rate was 81-83%, whilst in the second centre it was only 33-36%.


So what happens if we just pool the data from the two trials?

Now we find that the control has a significantly higher fall rate than the treatment (69% compared to 59%) with a risk ratio for treatment versus control of 0.85. Clearly this is a very misleading result. It results from pooling data with unequal proportions (overall fall rates for each centre are 0.82 versus 0.34) and unequal ratios of sample sizes (120:210 and 105:75). We have also lost the valuable information that the fall rates of both treatments are highly dependent on which centre is involved (we may have a major problem with patient care in centre 1!).

To analyze the data properly we need to return to the full information given in the two tables.

Multi-centre trial
Centre Treatment Outcome Risk ratio
Falls No falls
1 Treatment 100 (a1) 20 (b1) 1.03
Control 170 (c1)40 (d1)
2 Treatment 35 (a2)70 (b2) 0.93
Control 27 (c2) 48 (d2)

Example 2

Again we have a binary outcome variable, but here we believe that the outcome of the study has been affected by a confounding factor such as age. Hence we stratify the results to investigate that factor and adjust for its effects.

Our example here is from a case-control study on risk factors for a cattle disease. The data originate from 75 cases of all ages compared with 75 randomly drawn controls. We stratify the results so we can examine results for adults and calves separately. We use odds ratios to assess the importance of the risk factor in each stratum. If we pooled the data we would get a crude odds ratio of 2.98 - not as obviously misleading as in our first example, but we might justifiably question its relevance given the apparent difference between the groups.

Risk factors for cattle disease
Age Risk factor AffectedOdds
Yes No
Adults + 2512 2.50
- 1518
Calves + 30 24 5.25
- 5 21

The best way to approach such data is to first estimate a common effect estimate (either risk ratio or odds ratio as appropriate) with the appropriate confidence interval. The data are then tested for homogeneity. If the data are homogenous the common effect estimate can be tested for significance. If not, analysis reverts to considering each stratum separately.

There are several approaches to carrying out this sort of analysis. The most popular approach is to use what are called Mantel-Haenszel methods and we will concentrate on this approach here. An alternative approach is to combine the logarithms of the odds ratios. This method works satisfactorily when there are only a few strata and the sample sizes within each are large. There is also a maximum likelihood method known as the Cornfield-Gart method. These procedures are described by Gart (1970) and are summarized in Fleiss (2003) .


Common risk ratio and odds ratio

The Mantel-Haenszel common risk ratio is obtained by simply weighting the contribution of each individual risk ratio by a measure of its precision. This is done by taking the numerator and denominator of the risk ratio for each square separately and dividing each by the number of observations in that square. The components from each square are then summed, and the numerator is divided by the denominator to obtain the common risk ratio:

Algebraically speaking -

λMH   =    Σ ai (ci + di) / ni
 Σ ci (ai + bi) / ni
  • λMH is the Mantel-Haenszel common risk ratio;
  • ai, bi, ci, and di are the observed frequencies in each cell as shown in the examples above.
  • ni is the total number of observations in each table.

The value will be biased towards the risk ratio of the squares containing most observations. Hence using the data from our first example above, we get a common risk ratio of 1.01, rather larger than the arithmetic mean of 0.98.

As before the asymptotic confidence interval (1.96 times the standard error) is worked out for the logarithm of the relative risk, and then detransformed to obtain the interval for the relative risk itself. The standard error is given by Greenland & Robins (1985).

A similar approach is followed to get the common odds ratio. Again the contribution of each square to the common odds ratio is weighted by the number of observations in that square:

Algebraically speaking -

ωMH   =    Σ aidi / ni
 Σ bici / ni
  • λMH is the Mantel-Haenszel common risk ratio;
  • ai, bi, ci, and di are the observed frequencies in each cell as shown in the examples above.
  • ni is the total number of observations in each table.

Using the data from our second example above, we get a common odds ratio of 3.51. The asymptotic confidence interval is worked out for the logarithm of the odds ratio, and then detransformed to obtain the interval for the odds ratio itself. The standard error is given by Robins (1986). Exact confidence intervals are preferable when sample sizes are small.


Testing for homogeneity / interaction

In our first example we obtained centre risk ratios of 1.03 and 0.93, with a common risk ratio of 1.01. In this situation the common risk ratio does seem to be an appropriate summary effect measure for our data. But in the second example is the common odds ratio of 3.51 really appropriate to describe a risk ratio of 2.50 for adults and 5.25 for calves??

It would appear in this latter case we might have an interaction between the confounding factor (age) and the risk factor. In other words the effect of the risk factor is dependent on the level of the confounding factor. Putting it another way our different 2 2 tables may not be homogenous. How do we assess the importance of this interaction or heterogeneity?

Essentially we compare the observed values with the expected values assuming a common risk or odds ratio. In a 22 table if row and column totals are known, knowledge of one cell fixes the other three cells. Hence we base the test of homogeneity using just one value in each 22 square, usually the top left hand cell. The only difficulty is in working out what the expected values should be - this is straightforward but rather tedious!

For calculating the Mantel-Haenszel interaction chi square statistic we go back to the basic form of Pearson's chi square statistic - namely that X2 is equal to the square of the deviations divided by the parametric variance under the null hypothesis:

Algebraically speaking -

X2MH interaction   =   Σ  (aii)2
  • X2MH interaction is the Mantel-Haenszel interaction chi square statistic;
  • a is the observed frequency in the top left hand cell for the ith table;
  • i is the expected frequency in the top left hand cell for the ith table assuming a common risk or odds ratio - see for how i is estimated for the risk ratio and odds ratio;
  • s2ai is the variance of the expected frequencies. This is given by:
    1/(   1     +     1     +     1     +     1   )
    i i i i


So how do our examples work out in the test for interaction?

Multi-centre trial
Centre Treat
Outcome s2a X2MH
Falls No
1 Treatment 100 98.8 20 21.2 11.250.299
P = 0.585
Control 170 171.2 40 38.8
2 Treatment 35 36.3 70 68.7 9.87
Control 27 25.7 48 49.3
If we consider the first example, we have a common risk ratio of 1.01. The observed values for a1 and a2 are 100 and 35. The expected values (given in pink in the table) assuming no interaction work out to 98.8 for 1 and 36.3 for 2. From these values we can estimate the other expected values in the tables.

The closeness of observed and expected values support our suspicion that there is little evidence for any heterogeneity here. The variances for each square are then estimated and used to give a X2 value. The P-value provides no evidence for any heterogeneity in the data, so we accept the risk ratio of 1.01 as a reasonable estimate of the common risk ratio.


Risk factors for cattle disease
Age Risk factor Affected s2a X2MH
Yes No
Adults + 25
3.894 0.976
P= 0.323
- 15
Calves + 30
- 5
For the second example we have a common odds ratio of 3.51. The observed values for a1 and a2 are 25 and 30, compared with expected values assuming no interaction of 26.35 and 28.66.

Perhaps surprisingly the observed and expected values seem fairly similar, and the test provides no clear evidence for any heterogeneity. This is less surprising when we take into account the small sample sizes involved, namely 70 adults and 80 calves. In this case we should accept homogeneity for this study, but the apparent differences in the odds ratios for adults and calves suggest that a further study is needed with sufficient power to show up any difference between the two groups.


Mantel Haenszel association test

All that remains is to assess the significance or otherwise of the common risk or odds ratio, assuming that we have demonstrated homogeneity above. As before we base the test just on the observed and expected values in cell a of each table. Now however expected values are estimated on the basis of no association, rather than on the basis of a common risk or odds ratio. If applied to just one square the formula is algebraically identical to Pearson's chi square, except that it is multiplied by the factor (ni-1/ni). This is close to 1 except for small sample sizes.

Algebraically speaking -

X2MH association   =    (Σai − Σi)2
  • X2MH association is the Mantel-Haenszel chi square statistic for significance of the common risk or odds ratio;
  • ai & i are the observed and expected frequencies in cell 'a' in square 'i' assuming no association;
  • s2ai is the variance of the expected frequencies for square i which is given by the product of the four margin totals (ai+bi)(ci+di)(ai+ci)(bi+di) divided by ni2(ni-1).

Important point-

Improperly pooled data from 22 tables can produce misleading (= wrong) conclusions from the data. It can create apparent treatment effects where none exist, and similarly conceal important treatment effects.

If we apply this test to the first data set on a multicentre trial with a common MH risk ratio of 1.01 (shown in red on the figure), we obtain a very low Mantel-Haenszel chi square value of only 0.0005 (P = 0.982).

{Fig. 1}

From this we clearly have no evidence of association between treatment and outcome.

But just think back to the result of the Pearson's chi square test on the crude risk ratio (shown in green) - that gave a P value of 0.03! Using this approach would have led us to wrongly conclude that treatment was effective in reducing the incidence of falls. Improper pooling of data is one of the commoner reasons for incorrect statistical analysis of data in the literature.

Moving to the second example with a common MH odds ratio of 3.51 (again shown in red on the figure), we get a Mantel-Haenszel chi square value for association of 12.03 (P = 0.0005).

{Fig. 2}

We can therefore be confident that the risk factor is associated with the occurrence of the disease.

Note, that since this was a 'traditional' case control study, our odds ratio of 3.51 can only be equated to relative risk if the disease is 'rare'. In this case adjusting for the confounding factor of age has increased the crude odds ratio from 2.98 to 3.51.