Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

Search this site



Null Hypothesis Significance Testing

Inside a classical test

Null hypothesis significance testing is still the dominant approach to inference, despite being heavily criticised by statisticians. A very large number of standardized, pre-cooked, tests have been developed. Below is a brief resumé of the classical textbook approach to a generic statistical test, broken up to reveal its key steps.

In many pre-cooked statistical tests, these steps are frequently merged or otherwise concealed - on the assumption that all the user need do is to select and apply the most appropriate test. As a result, most conventional textbooks concentrate upon the application phase, pay less attention to the selection part, and provide little or no guidance about interpreting test results - or checking their assumptions.

To begin redressing those omissions let us adopt the conventional approach, ignore the preliminaries, and assume you have some results you wish to test.


  1. An appropriate statistic is chosen
    The first step is to decide upon the most appropriate measure of treatment effect although, for a number of the popular tests, the statistic chosen is also the test's name.
      For example, one of the more popularly tested statistics is the standardized difference between means, known as the t-statistic, and is tested using t-test - which we explore in Unit 8.
    The best choice of statistic depends upon a number of factors - including your experimental / sampling design, what sort of variable you used to record your data, how that data are distributed, and what you consider to be the most appropriate measure of 'treatment effect'.

    One further factor in choosing a statistic is how well it behaves. The most popular statistics are generally not only the most studied, but may also have mathematically straightforward properties - provided certain assumptions, such as normal data, are met. Notice also that, although some statistics offer more precision than others, the most efficient statistics tend to be sensitive to problems such as non-normal data.

  2. The hypotheses are established
    The next step is to draw up two contradictory hypotheses. Usually the alternative hypothesis (H1) would be drawn up first, derived from the biological model that you wish to investigate. For example, you propose that condom use reduces the chance of infection with HIV. Hence the mean incidence for condom users should be lower than the mean incidence for non-condom users:

    H1: μusers < μnon-users
    This can be reformulated by considering the difference between the two means :
    H1: μusers - μnon-users < 0
    A null hypothesis (H0) is then set up in contradiction to the alternative hypothesis. In this case it would be that the incidence rate of HIV in individuals who do not use condoms is the same as or higher than those who do use them.
    H0: μusers - μnon-users ≥ 0
    A test of this hypothesis would be a one-sided test as we are only considering whether condom use reduces the chance of infection. A two-sided test would have the alternative hypothesis that the mean incidence for users and non-users is not the same, whilst the null hypothesis would be that they are the same. In other words, by implication, a 2-sided test allows for the possibility that condom use may increase HIV incidence - or that smoking may reduce the risk of cancer.

  3. A test statistic is calculated, and its distribution estimated.
    This is the test statistic with which to challenge the null hypothesis. The popular test statistics have sampling distributions readily predicted from statistical theory - the parameters for which are estimated from your data, or occasionally, arise from your experimental design.
      For the condom use example, we might use the difference between the rates in users and non-users divided by some measure of the pooled variance to standardise it. This would give us a t-statistic which (given certain conditions are met) will follow the t-distribution - which we meet in Unit 6.
    Alternately, the statistic's distribution can be estimated by some form of computer simulation - which should allow for your sampling / experimental design. Whichever method you use to estimate how your statistic varies, it uses a statistical model which assumes the null hypothesis is true.

  4. The P-value for that test statistic is determined.
    We can now compare our observed value of the statistic under test with how we expect it to be distributed under the null hypothesis. For pre-cooked tests this either from a probability calculator or from tables. The resulting P-value is the probability of getting the observed, or a more extreme, value of the test statistic if the null hypothesis is true.

  5. The inference is drawn.
    The P-value is used as a measure of the degree of consistency (or inconsistency) of your data with the null hypothesis.
    • When the P-value is small (usually less than α=0.05), the null hypothesis is rejected, and the alternative hypothesis is accepted. Alternately, the statistic under test is declared to be 'significant'.
        α is the rejection level and, when the null hypothesis is true, is assumed to equal the Type 1 error rate - in which case α is the probability of wrongly rejecting the test's null hypothesis.
    • When the P-value is large, we stick with the null hypothesis until further data are available - and the test statistic is said to be 'non significant'.