InfluentialPoints.com
Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

 

 

 

The Z-test

General principles of Parametric Tests

Parametric tests use the properties of the normal distribution to assess whether you can reject the null hypothesis or not. Hence they assume that your data are drawn from a normal distribution, or that sample sizes are sufficiently large that the test statistic follows a normal distribution.

Let us begin by considering what happens if all the observations in your samples are from just one population. In other words, these observations are all measurements (Yi) of the same variable 'Y'. We will assume that this population is normally distributed about its population mean 'μ', with a standard deviation σ'. In this situation the sample observations are also likely to be distributed approximately normally.

Provided these assumptions are true, 95% of the observations in your samples will be less than 1.96 standard deviations from the population mean.

{Fig. 1}
stannorm.gif

In other words on average, there is less than a 5% probability that any one observation will deviate more than +/− one standard deviation from its population mean. We can use this to develop our first parametric statistical test based on the standard normal distribution - the one sample Z-test:

 

The one-sample Z-test

Let us say you have a single PCV observation and you wish to know whether it is likely that it was drawn from a normally distributed population with known mean and standard deviation. You can transform the observation (Yi) to a standard normal deviate (z) by subtracting the population mean (μ) and dividing by the population standard deviation of the observations (σ). Standard normal deviates follow the standard normal distribution.

Algebraically speaking -

z =    Yi − μ0
σ
Where
  • z is the difference between an observed value of 'Y' and its true mean - expressed in standard normal deviates. Hence z is normally distributed about 0 with a standard deviation of 1.
  • μ0 is the true mean of the population your observations were randomly selected from.
  • σ is the standard deviation of the observations in that population

In order to know which critical value to use to compare with our z-value we need to be clear about our hypotheses. Let us adopt the following hypotheses:

  • The null hypothesis ( H0) is that the observation belongs to a population with mean μ1 equal to μ0.
  • The alternate hypothesis (H1) is that the observation belongs to a population with μ1 not equal μ0.

{Fig. 2}
tails.gif

You can reject the null hypothesis for any result falling within the rejection region of 5% (α) of the distribution. This rejection region is split into two extreme tails with each tail corresponding to 2.5% of the distribution. A result has to be more than 1.96 standard deviations above or below the mean in order to reject your null hypothesis. Since the normal distribution is symmetrical, the critical value is the same for each tail.

For any result falling within the remaining 95% of the distribution, you accept the null hypothesis. As you might expect, this portion of the distribution is known as the acceptance region(1−α.)

In some situations, however, you may have a different alternative hypothesis. You may know that it is only possible for your observation to come from a population with a lower mean than your known population. In this situation the alternative hypothesis is different:

  • The null hypothesis ( H0) is that the observation belongs to a population with mean μ1 equal to μ0.
  • The alternate hypothesis (H1) is that the observation belongs to a population with μ1 less than μ0.

{Fig. 3}
ltail.gif

This is therefore the situation where you are only interested in 1 tail of the distribution. For a 1-tailed test, the critical value is 1.645 standard deviations from the mean. For your alternative hypothesis a result has to be more than 1.645 standard deviations below the mean in order to reject the null hypothesis. For any result falling within the remaining 95% of the distribution, you can accept the null hypothesis.

Worked example

A llama was observed to be suffering from head tremors and shaking. Various tests were carried out including determination of blood copper concentration. It was found to have a concentration of 11.1 μmol/litre. This value was compared to the known values for healthy llamas (μ = 8.72, σ = 1.3825) in the UK.

z =    11.1 − 8.72  =     1.72
1.3825

We have no reason to believe that the copper level in the sick animals would deviate in one or other direction. Hence we use a two tailed test and compare the value with 1.96. Since it is less than that, we can accept the null hypothesis that copper concentration is within the normally expected range.

 

We have of course already met z values for individual observations as z-scores in Unit 3. They are most commonly used in human nutrition studies for assessing whether children are malnourished or not. Here the null population is taken to be the reference values of children's weight for height, height for age and weight for age produced by the international health organizations. Any child with a z-score less than -2 is taken to be malnourished.

The same reasoning for testing individual observations also applies to means. Thus the Z test can be used to assess whether a sample mean () is drawn from a known population:

Algebraically speaking -

z =     − μ0
σ/√n
where
  • z is the difference between the sample mean and the known population mean (μ0) expressed in standard normal deviates,
  • μ0 is the known population mean,
  • σ is the known population standard deviation of the observations,
  • n is the number of observations comprising the sample.

Again for a two-tailed test you compare it with the critical value of + or −1.96. It if is greater than 1.96 or less than −1.96, you can conclude with 95% probability that the sample mean was not from that population.

Worked example

A small group of 4 llamas was observed to be suffering from the same head tremors and shaking. The mean blood copper concentration for the 4 animals was 9.59 μmol/litre. This mean value was compared to the known values for healthy llamas (μ = 8.72, σ = 1.3825) in the UK.

z =    9.59 − 8.72  =     1.26
1.3825/√4

Since this value is less than 1.96, you can again accept the null hypothesis that this mean came from a population of healthy llamas.

Testing the significance of other statistics

The significance of various other statistics can be tested with the z-test provided they are distributed normally, are based on large samples and an estimate of their standard error is available. For example the distribution of the Kappa coefficient tends to normality, so we can test whether it deviates from 0 by dividing the value of the statistic by its standard error (see ), and then comparing it with the appropriate value of the standardised normal deviate at P = 0.05. In this case we would normally use a one-sided test since our alternative hypothesis is usually that agreement is better than that predicted by chance.

Algebraically speaking -

z    =    κ
SE (0) κ

Note that some authorities advocate use of the z-test for Kappa irrespective of sample size, whilst others instead recommend use of the t-distribution for small samples.