Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)



Binomial and related tests: Use & misuse

(binomial test, independence, sign test, McNemar's test, Cox and Stuart test for trend)

Statistics courses, especially for biologists, assume formulae = understanding and teach how to do  statistics, but largely ignore what those procedures assume,  and how their results mislead when those assumptions are unreasonable. The resulting misuse is, shall we say, predictable...

Use and Misuse

The binomial test is a one-sample test used to assess whether an observed proportion derived from a single random sample differs from an expected parametric proportion. The sign test is used for paired data where quantitative measurements are not possible, but where it is possible to rank each member of a pair (or the same individual before and after a treatment) for some characteristic with respect to each other. In other words measurement is only possible on the ordinal scale.  It also has an important use for testing paired data where quantitative measurements are possible, but where the distribution of differences is neither normal nor symmetrical. McNemar's test is essentially the sign test under another name but applied when the response variable is binary. It is used to compare paired proportions.

The binomial and sign tests are used sparingly over a wide range of disciplines, mainly for testing of sex ratios against expected proportions, and for assessing the outcomes of contest and choice situations. McNemar's test is used more heavily in medical and veterinary research for before-after studies, comparison of diagnostic tests on the same samples, and matched case-control and cohort studies. The Cox and Stuart test for trend is quite rare and we have found few examples of its use.

The most important assumption for all of these tests is that observations (or pairs of observations) are independent. Lack of independence can arise in many ways and we give several examples from the literature of misuse of the tests in such circumstances. In ecological field studies the selection of subjects is often not under control of the experimenter, leading to repeated observations on the same individuals. Cluster sampling also gives rise to non-independent observations. Two further misuses of McNemar's test are common. Firstly data should not be presented in the conventional 2 2 contingency table, but should instead show the number of concordant and discordant pairs. Secondly the emphasis should be on the magnitude of change in proportions together with estimation of the confidence interval of the difference - quoting just the P-value for the test is uninformative and can be misleading. Sometimes the main interest is on how well the results of two tests agree and in this situation the Kappa measure of agreement is more appropriate than McNemar's test. For before and after studies, the two periods should be of similar duration.

Use of these tests is often associated with collapsing  rank or measurement data. This is usually a bad idea on the grounds that it is nearly always better to retain the maximum amount of information in your data. Collapsing measurement or polytomous variables to a binary variable always runs the risk of bias if boundaries are arbitrary. Sometimes one suspects that the sign test has been used when all else fails. Small sample sizes is an especial hazard for the two-sided binomial test since it cannot come out significant if there are fewer than 6 trials, even for the most extreme result.


What the statisticians say

Armitage & Berry (2002) deal with McNemar's test and how to estimate the confidence interval of the difference in Chapter 4. Woodward (2004) covers McNemar's test for paired proportions under the analysis of matched studies in Chapter 6. Agresti (2002) deals with McNemar's test for paired proportions in Chapter 10. Conover (1999) covers all the tests considered here (and a few others) in Chapter 3. Fleiss et al. (2003) introduces McNemar's test in Chapter 8 before considering multiple matched samples. Sokal & Rohlf (1995) and Zar (1999) both describe the binomial test.

Durkalski et al. (2003) proposes a simple adjustment to the McNemar test for the analysis of clustered matched-pair data. Kusuoka & Hoffman (2002) advise on use of McNemar's test in circulation research. Chernick & Liu (2002) note that the power of the binomial test does not go up monotonically as sample size increases but in a saw-toothed manner. Newcombe (1998) looks at exact methods for estimating the confidence interval of the difference between paired proportions. Lachenbruch & Lynch (1998) describe extensions of McNemar's test for assessing screening tests. Tango (1998) presents an equivalence test and confidence interval for the difference in proportions for the paired-sample design. Bennett & Underwood (1970) look at McNemar's test and its power function. Gart (1969) provides an exact test for comparing matched proportions in crossover designs.

Wikipedia has sections on the binomial test, McNemar'stest, and the sign test.   The Handbook of Biological Statistics describes the exact binomial test. Michael P. Fay describes the (recommended) exact McNemar test and matching confidence intervals available in R. Dennis Walsh provides some worked examples for the Cox and Stuart Test for Trend, whilst this test in R can be found at the blogspot of statistic-on-air.