 
Null Hypothesis Significance Testing
Inside a classical test
Null hypothesis significance testing is still the dominant approach to inference, despite being heavily criticised by statisticians. A very large number of standardized, precooked, tests have been developed. Below is a brief resumé of the classical textbook approach to a generic statistical test, broken up to reveal its key steps.
In many precooked statistical tests, these steps are frequently merged or otherwise concealed  on the assumption that all the user need do is to select and apply the most appropriate test. As a result, most conventional textbooks concentrate upon the application phase, pay less attention to the selection part, and provide little or no guidance about interpreting test results  or checking their assumptions.
To begin redressing those omissions let us adopt the conventional approach, ignore the preliminaries, and assume you have some results you wish to test.
Step:
 An appropriate statistic is chosen
The first step is to decide upon the most appropriate measure of treatment effect although, for a number of the popular tests, the statistic chosen is also the test's name.
For example, one of the more popularly tested statistics is the standardized difference between means, known as the tstatistic, and is tested using ttest  which we explore in Unit 8.
The best choice of statistic depends upon a number of factors  including your experimental / sampling design, what sort of variable you used to record your data, how that data are distributed, and what you consider to be the most appropriate measure of 'treatment effect'.
One further factor in choosing a statistic is how well it behaves. The most popular statistics are generally not only the most studied, but may also have mathematically straightforward properties  provided certain assumptions, such as normal data, are met. Notice also that, although some statistics offer more precision than others, the most efficient statistics tend to be sensitive to problems such as nonnormal data.
The hypotheses are established
The next step is to draw up two contradictory hypotheses. Usually the alternative hypothesis (H_{1}) would be drawn up first, derived from the biological model that you wish to investigate. For example, you propose that condom use reduces the chance of infection with HIV. Hence the mean incidence for condom users should be lower than the mean incidence for noncondom users:
H_{1}: μ_{users} < μ_{nonusers}
This can be reformulated by considering the difference between the two means :
H_{1}: μ_{users} 
μ_{nonusers} < 0
A null hypothesis (H_{0}) is then set up in contradiction to the alternative hypothesis. In this case it would be that the incidence rate of HIV in individuals who do not use condoms is the same as or higher than those who do use them.
H_{0}: μ_{users} 
μ_{nonusers} ≥ 0
A test of this hypothesis would be a onesided test as we are only considering whether condom use reduces the chance of infection. A twosided test would have the alternative hypothesis that the mean incidence for users and nonusers is not the same, whilst the null hypothesis would be that they are the same. In other words, by implication, a 2sided test allows for the possibility that condom use may increase HIV incidence  or that smoking may reduce the risk of cancer.
 A test statistic is calculated, and its distribution estimated.
This is the test statistic with which to challenge the null hypothesis. The popular test statistics have sampling distributions readily predicted from statistical theory  the parameters for which are estimated from your data, or occasionally, arise from your experimental design.
For the condom use example, we might use the difference between the rates in users and nonusers divided by some measure of the pooled variance to standardise it. This would give us a tstatistic which (given certain conditions are met) will follow the tdistribution  which we meet in Unit 6.
Alternately, the statistic's distribution can be estimated by some form of computer simulation  which should allow for your sampling / experimental design. Whichever method you use to estimate how your statistic varies, it uses a statistical model which assumes the null hypothesis is true.
 The Pvalue for that test statistic is determined.
We can now compare our observed value of the statistic under test with how we expect it to be distributed under the null hypothesis. For precooked tests this either from a probability calculator or from tables. The resulting Pvalue is the probability of getting the observed, or a more extreme, value of the test statistic if the null hypothesis is true.
 The inference is drawn.
The Pvalue is used as a measure of the degree of consistency (or inconsistency) of your data with the null hypothesis.
 When the Pvalue is small (usually less than α=0.05), the null hypothesis is rejected, and the alternative hypothesis is accepted. Alternately, the statistic under test is declared to be 'significant'.
α is the rejection level and, when the null hypothesis is true, is assumed to equal the Type 1 error rate  in which case α is the probability of wrongly rejecting the test's null hypothesis.
 When the Pvalue is large, we stick with the null hypothesis until further data are available  and the test statistic is said to be 'non significant'.
