InfluentialPoints.com
Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

 

 

Ratios for summarising relationships

We introduced the use of risk ratios, odds ratios and incidence rate ratios as measures of association for binary variables in Unit 1. To recap briefly:
Explanatory
variable ⇓
Response variable ⇓Totals
+-
+aba + b
-cdc + d
Totalsa + cb + da + b
+ c + d = n

The simplified notation for a 2 by 2 table is given here where

  • a,b,c and d are the number of individuals in each cell,
  • n is the total number of individuals.

  • A risk ratio is the ratio of two proportions - for example a/(a+b) / c/(c+d). In epidemiological parlance it is the proportion infected for those exposed to a risk factor divided by the proportion infected for those not exposed to that risk factor. There are two types of risk ratios, depending on whether the study is cross-sectional or prospective: The ratio of two prevalences is called the prevalence risk ratio. The ratio of two cumulative incidences is called the cumulative incidence risk ratio.

  • An odds ratio is the ratio of two odds - for example a/b / c/d. In epidemiological parlance it is the odds of infection for those exposed to a risk factor, divided by the odds of infection for those not exposed to that risk factor. The odds ratio is commonly used as the effect measure for cross-sectional analytical surveys, although the risk ratio is more appropriate for this type of study as it more readily interpretable. The odds ratio is also used as the effect measure in case-control studies where the risk factors to which the cases in the population are exposed are compared with those to which a randomly-selected group of controls are exposed. For this type of study the odds ratio is the only appropriate measure. It is estimated indirectly by dividing the odds that the cases have been exposed to a particular risk factor by the odds that the controls have been exposed. Depending on the precise type of case-control study and the level of prevalence, the odds ratio will to a greater or lesser extent approximate to the risk ratio.

  • An incidence rate ratio is the ratio of two rates - for example e1/N1 / e2/N2 where e1 & e2 are the number of events in each population and N1 & N2 are the size of the two groups, midway through the time period. In epidemiological parlance it is the ratio of the incidence rates in exposed and unexposed individuals. Incidence rate can be estimated as the number of cases divided by sum of time at risk - or (as above) as the number of cases divided by the average size of the group over the period. Note that a rate, unlike a proportion, can exceed 1. Rate ratios can only be estimated from cohort studies because we need to know the number of cases over a defined period of time.

The confidence interval of a ratio provides a measure of the reliability of the estimate of the ratio. It is common practice to also use the confidence interval as a surrogate statistical test. This is unwise - a significance test (such as Pearson's chi square test or Fisher's exact test) and a confidence interval around a ratio should instead be considered as complementary. The test is to formally assess a null hypothesis and the interval gives an indication of reliability of the estimate.

As with the confidence intervals we have met before, large sample normal approximation intervals are those most commonly used. We give below computational details for calculating these intervals. But such intervals are only valid for large samples and, even then, may be misleading if some proportions or odds are very small.

 

 

Confidence interval of risk ratio

Large sample normal approximation

A transformation is required for risk ratios to be approximately normal. The most appropriate transformation is the natural logarithm of the risk ratio. An approximate estimate of the standard error of the log risk ratio (lnRR) is given by:

Algebraically speaking -

SE(lnRR)   =   
1 - 1 + 1 - 1
aa + bcc + d
where a, b, c, and d are the frequencies in a 2 by 2 contingency table.

The 95% Wald confidence interval of the risk ratio is then given by:

Algebraically speaking -

95% CI (RR)  =    exp (lnRR  ±  1.96 SE)

If the frequencies are suitably large (none less than 5), and the risk ratio not too extreme, the errors can be accepted as 'approximately' normal.

 

Exact methods and other approximations

Exact intervals sensu stricto do not exist for the risk ratio - other than by Monte Carlo. However Thomas & Gart (1977) suggest an "exact" type method based on fixed marginals in the 2 × 2 table. R provides a bootstrap interval which gives a somewhat wider interval than the simple formulae.

There are a number of approximate methods which are suitable for application to small samples which are available in various software packages. One approach is to invert a single two-sided test using the Wilson score statistic - we term the resulting interval the score method interval. The methods of Koopman (1984) and Miettinen & Nurminen (1985) give equivalent results. This may be the 'small sample' method used by the 'epitools' package for R.

 

 

Confidence interval of odds ratio

Large sample normal approximation

Again a transformation is required for the odds ratio to be approximated by a normal distribution. The most appropriate transformation is the natural logarithm of the odds ratio (lnOR). An approximate estimate of the standard error of the log odds ratio is given by the square root of the sum of the reciprocal of the cell frequencies:

Algebraically speaking -

SE(lnOR)   =   
1 + 1 + 1 + 1
abcd
Where a, b, c, and d are the frequencies in a 2×2 contingency table.

If these cell frequencies are suitably large (none less than 5), and the odds ratio not too extreme, the errors can be accepted as 'approximately' normal.

The 95% Wald confidence interval of the odds ratio is then given by:

Algebraically speaking -

95% CI(OR)  =    exp(lnOR  ±  1.96 SE)

If the frequencies are suitably large (none less than 5), and the odds ratio not too extreme, the errors can be accepted as 'approximately' normal.

 

Exact methods and other approximations

There is an exact confidence interval for the odds ratio based on the non-null hypergeometric model - which we term the conditional exact interval. The oddsratio function, provided by the 'epitools' package for R, gives 'exact' mid-P confidence intervals, and Fisher exact intervals. The most commonly used method (that used by StatXact for example) was suggested by Cornfield. It consists of inverting two separate one-sided tests - an approach termed the tail method. This interval is attached not to the conventional odds ratio but to the conditional maximum likelihood estimate of the odds ratio. This differs slightly from the conventional sample odds ratio. Both the interval and the maximum likelihood estimate of the odds ratio have to be obtained iteratively.

Because the conditional exact odds ratio is much more discrete than the unconditional odds ratio, Cornfield intervals can be extremely conservative. Hence there are strong theoretical reasons for applying mid-P criteria to the test-inversion when obtaining this interval. In this case the median-unbiased odds ratio is used instead of the conditional odds ratio. The 'epitools' package for R gives this interval as the mid-p exact interval.

Cornfield, and later Fisher, proposed a large-sample approximation to Cornfield's exact interval for odds ratios - which we term the Cornfield approximate interval. It is much easier to evaluate than the null hypergeometric probability function and provides the equivalent of the mid-p exact interval.

 

 

Confidence interval of incidence rate ratio

Large sample normal approximation

Again a transformation is required for it to be approximated by a normal distribution. An approximate standard error of the log rate ratio is given by:

Algebraically speaking -

SE(lnIR)   =   
1 + 1
e1e2
where:
  • e1 and e2 are the number of events in those populations
  • The 95% confidence limits of the rate ratio are then given by:

    Algebraically speaking -

    95% CI (IR)  =    exp(lnIR  ±  1.96 SE)

     

    Exact methods and other approximations

    An exact (and mid-P exact) interval for a rate ratio can be obtained by treating the total number of cases as fixed - so computation of expected values and their variance is done conditional on the observed case margin total. This is a similar approach to that used for estimating an exact confidence interval for the conditional odds ratio. The mid-P exact interval is given by the 'epitools' package for R.

    Another large sample approximate confidence interval of the incidence rate ratio (IR) can be calculated based on the Poisson distribution (see Woodward (2004)). First the probability that an event occurs in (say) population 2 is calculated as e2 / (e1+e2). The upper and lower confidence limits for this proportion () are then obtained using the normal approximation, namely ±1.96{[(1-)] / (e1+e2)}

    The lower and upper limits for limit for IR are then given by:

    Algebraically speaking -

    Lower 95% confidence limit of IR  =    ( N 1 ) ( L )
    N 2 1-L
    Upper 95% confidence limit of IR  =    ( N 1 ) ( U )
    N 2 1-U

    Related
    topics :

    Matched studies

    Confidence interval for attributable risk