Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
In many ways exact tests are the easiest type of test to understand, partly because, with the advent of computer simulation models they can often be evaluated without the use of complicated or obscure mathematical formulae - and in doing so, their assumptions can be exposed. However, because computation was difficult and expensive until quite recently, considerable effort has been invested in developing tests requiring minimal calculations - by the end-user at least. Of these, the most popular are known as 'parametric tests'.
In principle, parametric methods assume data are normally distributed - or, less commonly, that your data represent one of the mathematically tractable frequency distributions which are closely related to the normal distribution. Under this assumption the distribution of a number of statistics can be estimated. These statistics include means, differences between means, and ratios between variances.
In order to estimate the distribution of the statistic of interest, you require an estimate of the parameters of the distribution your data represent - in the case of the normal distribution - its mean and standard deviation.
A wide variety of parametric methods are currently available. Provided their assumptions are fully met, parametric tests are just as powerful as exact methods.
In practice of course, no data are ever normal, and perfect parametric normal populations only exist in mathematical and simulation models. Consequently, the term approximately normal is often applied - even though there is no quantitative criterion as to what 'approximately' might mean. Testing data for normality does not help in this respect - even though it is commonly done. Failing to show data are non-normal does not mean they are normal - it only shows you did not have enough power to show they were non-normal.
Fortunately, when calculated from large samples, a number of statistics (such as the mean) converge towards a normal distribution - irrespective of how their data are distributed. As a result, the crucial question is not how the data are distributed - but how the statistics are. More generally therefore, statisticians worry about how 'errors' are distributed. Be that as it may, even if your data are perfectly normal, many statistics have distributions very different from the smooth continuous normal family. Nevertheless, these distributions are commonly approximated by a parametric distribution.
Parametric models have one further advantage that is particularly important for hypothesis tests, the shape of their tails depend upon estimated means and standard deviations - which, provided your data represent a normal population, places comparatively little weight upon the more extreme observations in your sample. Unfortunately, as we note in
Parametric tests vary in their sensitivity to non-normality. If the distribution is significantly non-normal, there are two options:
What do you do if the distributions cannot be normalised, or if you have categorical data?
Non parametric tests do not assume the data have any particular distribution, and can analyse data where no other test is applicable. Confusingly, the term non parametric is also applied to tests that assess statistics according to rank rather than location.
Non-parametric tests generally involve much less computation than parametric tests. Some biologists prefer non-parametric tests because they do not have to consider whether data are normally distributed, and their conclusions are therefore more 'robust'.
However, because transforming continuous data to ranks wastes information, non-parametric tests are less powerful than parametric ones. Therefore, provided their assumptions are met, parametric tests can demonstrate smaller differences than non-parametric methods. Notice however that, if you should not assume your statistic has a predefined parametric distribution, this power may be an illusion.
Non-parametric comparisons fall into roughly three types:
Confusingly, many nonparametric statistics are assumed to approximate to a parametric distribution - and are therefore assessed using critical values, rather than by relative rank. In other words, although the statistic (and its test) is described as being nonparametric, the test's model is parametric.