Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
r × c tables & partitioning
Larger contingency tables
We deal with testing for independence / association in larger contingency tables in exactly the same way as we tested replicated samples for homogeneity in the section above. We can also partition the G or X2-values to investigate what is happening within a large table. We will start with tables with two columns but multiple rows (known as 2×r tables). With only two columns this is equivalent to comparing several proportions.
We have five percentages that we wish to compare ranging from 1.25% to 35.29%. Overall significance of the association between number of dogs present on a farm and incidence of orf was assessed using Pearson's chi square - we have used G instead (see bottom row of table). The authors then compared each row with the 'control' (no dogs) using odds ratios and their associated confidence intervals. We will use a different approach and partition the G value to obtain more information from the table. However, as we will see, there are only certain ways we can do this without running into problems.
Partitioning the G statistic
When we make a series of comparisons within one data set, it is important that those comparisons are independent Another term for this that we will meet again is orthogonal. What we mean by this is that change in the outcome of one comparison does not affect the outcome of other comparisons. We will discuss this further below, but first let us see how to identify a set of orthogonal comparisons for our 5×2 table.
The procedure is first compute G for the 2×2 table comprising the first two rows. Then compute G for those two rows combined versus row three; then for rows one to three combined versus the fourth row and so on until the last row. This process is shown diagrammatically below:
So why are these four comparisons orthogonal?
Let us take the first square and vary the frequencies. We assume that the total number of observations and the margin totals are fixed. Despite the change in individual cell frequencies (and consequently the G statistic) from Square IB to Square IB, all the other comparisons above are unaffected because only the column totals are carried over to the next comparison.
Orthogonal comparisons are not all good news though. When you look at the comparisons resulting from this partitioning, you may conclude that that not all these comparisons are terribly useful!
You can gain a measure of control over which comparisons you carry out by putting the rows in a different order. In this case a more informative partitioning of this table would start from the bottom of the table and work upwards to give Partition II. The most significant difference was between no dogs on the farm (incidence of 1.25%) and 1-4+ dogs on the farm (incidence of 17.6%). The difference between there being 3-4 dogs and 4+ dogs should also be investigated further.
Note however that you should decide in advance which orthogonal comparison you are going to make before looking at the data. Post-hoc selection of comparisons is always open to criticism since we will no longer be operating at our specified probability level. In the real world, of course, one has to accept that in many studies the choice will be made after data inspection, and the choice of orthogonal comparisons will, so to speak, at least limit the damage.
If orthogonal comparisons really don't meet your needs, then you may need to make non-independent comparisons. This would be the case if you wish to compare incidence with no dogs (the 'control' in this study) with the incidence at each other