InfluentialPoints.com
Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

 

 

 

r c tables & partitioning

Larger contingency tables

We deal with testing for independence/association in larger contingency tables in exactly the same way as we tested replicated samples for homogeneity in the section above. We can also partition the G or X2-values to investigate what is happening within a large table. We will start with tables with two columns but multiple rows (known as 2r tables). With only two columns this is equivalent to comparing several proportions.

Relationship between incidence of orf
in farmworkers and number of dogs on farm
No. dogs+ orf- orf% affectedOdds ratio
(95%CI)
None1791.251.0
12513515.6314.63 (1.95-110.0)
22914117.0616.25 (2.17-121.5)
3/41911514.1813.05 (1.71-99.5)
4+183335.2943.09 (5.52-336.1)
Gdf=4 = 32.09      P = .000002

Our example here is taken from a cross-sectional study of risk factors for farmworkers contracting the disease orf. We have five percentages that we wish to compare ranging from 1.25% to 35.29%. Overall significance of the association between number of dogs present on a farm and incidence of orf was assessed using Pearson's chi square - we have used G instead (see bottom row of table). The authors then compared each row with the 'control' (no dogs) using odds ratios and their associated confidence intervals. We will use a different approach and partition the G value to obtain more information from the table. However, as we will see, there are only certain ways we can do this without running into problems.
 

Partitioning the G statistic

When we make a series of comparisons within one data set, it is important that those comparisons are independent Another term for this that we will meet again is orthogonal. What we mean by this is that change in the outcome of one comparison does not affect the outcome of other comparisons. We will discuss this further below, but first let us see how to identify a set of orthogonal comparisons for our 52 table.

The procedure is first compute G for the 22 table comprising the first two rows. Then compute G for those two rows combined versus row three; then for rows one to three combined versus the fourth row and so on until the last row. This process is shown diagrammatically below:
No. dogs+ orf- orf
None179
125135
229141
3/419115
4+1833
No. dogs+ orf- orf
None
or 1
26214
229141
3/419115
4+1833
No. dogs+ orf- orf
None
or 1
or 2
55355
3/419115
4+1833
No. dogs+ orf- orf
None
or 1
or 2
or 3/4
74470
4+1833

 

So why are these four comparisons orthogonal?  
Square IA
No. dogs+ orf- orf
None1070
116144
Totals26214
G = 0.34
Square IA
No. dogs+ orf- orf
None179
125135
Totals26214
G = 15.21
Let us take the first square and vary the frequencies. We assume that the total number of observations and the margin totals are fixed. Despite the change in individual cell frequencies (and consequently the G statistic) from Square IB to Square IB, all the other comparisons above are unaffected because only the column totals are carried over to the next comparison.
 
Partition I
ComparisonsGdfP-value
 0  vs 1115.21<0.001
0-1 vs 2  1 3.270.071
0-2 vs 3/41 0.050.823
0-4 vs 4+113.56<0.001
Total432.09
Partition II
ComparisonsdfGP-value
    0 vs 1-4+ 121.37<0.001
    1 vs 2-4+ 1 0.680.410
    2 vs 3-4+1 0.510.475
3/4 vs 4+1 9.530.002
Total432.09
Orthogonal comparisons are not all good news though. When you look at the comparisons resulting from this partitioning, you may conclude that that not all these comparisons are terribly useful!

    For example, why would you wish to compare the pooled 0/1/2 categories with the 3/4 category, other than that it happens to be part of the orthogonal 'set' of comparisons.

 

You can gain a measure of control over which comparisons you carry out by putting the rows in a different order. In this case a more informative partitioning of this table would start from the bottom of the table and work upwards to give Partition II. The most significant difference was between no dogs on the farm (incidence of 1.25%) and 1-4+ dogs on the farm (incidence of 17.6%). The difference between there being 3-4 dogs and 4+ dogs should also be investigated further.

Note however that you should decide in advance which orthogonal comparison you are going to make before looking at the data. Post-hoc selection of comparisons is always open to criticism since we will no longer be operating at our specified probability level. In the real world, of course, one has to accept that in many studies the choice will be made after data inspection, and the choice of orthogonal comparisons will, so to speak, at least limit the damage.

If orthogonal comparisons really don't meet your needs, then you may need to make non-independent comparisons. This would be the case if you wish to compare incidence with no dogs (the 'control' in this study) with the incidence at each other level. The problem then is that that the more comparisons you make, the greater is your probability of falsely rejecting the null hypothesis. Instead you need to adjust the significance level using a Bonferroni correction so that it is more difficult to reject the null hypothesis. We go into the reasoning behind this when we consider multiple comparison of means in unit 11.

Algebraically speaking -

α'   =   α
2(no. rows-1)
where

  • α' is the adjusted probability level;
  • α is the desired probability level (usually 0.05);
  • r is the number of rows.

In our example we would therefore require non-orthogonal comparisons to be significant at probability levels of P=0.00625 or less before we can accept them as being different.