Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)



What is a Z-score?

A Z-score (or standard score) represents how many standard deviations a given measurement deviates from the mean. In other words it merely re-scales, or standardizes, your data. A Z-score serves to specify the precise location of each observation within a distribution. The sign of the Z-score (+ or − ) indicates whether the score is above (+) or below ( − ) the mean.

A Z-score is calculated by subtracting the mean value from the value of the observation, and dividing by the standard deviation. Commonly a known reference population mean and standard deviation are used.

Algebraically speaking -

Z   =    Yi  −  μ
  • Z is the number of standard deviations a given measurement deviates from the mean,
  • μ is the true population mean,
  • σ is the true population standard deviation.

Alternatively, Z-scores can be calculated using the sample mean and the sample standard deviation:

Algebraically speaking -

Z   =    Yi  −  
  • Z is the number of standard deviations a given measurement deviates from the mean
  • is the mean of the sample,
  • s is the standard deviation of the observations in the sample

The latter formulation is most commonly used when all the measurements in a sample are transformed to Z-scores to give a Z-score distribution.



Characteristics of a Z-score distribution

    If your Z-score distribution is based on the sample mean and sample standard deviation, then the mean and standard deviation of the Z-score distribution will equal zero and one respectively. If your Z-score distribution is based on the population mean and population standard deviation, then the mean and the standard deviation of the Z-score distribution will only approximate to zero and one if the sample is random.

    The shape of a Z-score distribution will be identical to the original distribution of the raw measurements. If the original distribution is normal, then the Z-score distribution will be normal, and you will be dealing with a standard normal distribution. You can then make assumptions about the proportion of observations below or above specific Z-values. If however, the original distribution is skewed, then the Z-score distribution will also be skewed. In other words converting data to Z-scores does not normalize the distribution of that data!

    In some applications (such as weight-for-age in nutritional studies), the Z-scores are not based upon the known population mean and standard deviation, but on an external reference population. In this situation the Z-scores are used to identify those individuals in the sample falling below a specified Z-score. Sometimes the distribution of the whole sample is examined, in which case the Z-scores will not have a mean of zero and a standard deviation of one - what is of interest is the extent to which their distribution differs from the reference population.


    Sometimes Z-scores are themselves transformed to avoid negatives values. For example, the T-score has a mean of 50 and a standard deviation of 10. You can transform Z-scores to T-scores by multiplying each Z-score by 10 and adding 50.

    Algebraically speaking -

    Transformed standard score    =    μnew + Z σnew
    • Z is the Z-score of an observation,
    • μnew is the new mean of the population,
    • σnew is the new standard deviation of the observations in that population.

    Another commonly used transformed score is the (so called) intelligence quotient (IQ) score. This has a mean of 100 and (usually) a standard deviation of 15.



Uses of Z-scores

  1. To identify the position of observation(s) in a population distribution

    This is the commonest use of Z-scores. Converting a measurement to a Z-score indicates how far from the mean the observation lies in units of standard deviations. If the population distribution approximates to a normal distribution, then we can also estimate the proportion of the population falling above or below a particular value.

    This application has been most developed in studies of nutritional state of children. The basic data are age, sex, weight and height. The three preferred indices are:

    1. Weight-for-height
      This is an indicator of wasting and is associated with failure to gain weight or a loss of weight. In other words acute malnutrition.
    2. Height-for-age
      Low height for age is an indicator of stunting, which is usually associated with poor economic conditions or repeated exposure to food shortage caused by drought or war. In other words chronic malnutrition.
    3. Weight-for-age
      This reflects the effects of either wasting or stunting, or both.
    Other measures used to assess condition of (especially) children severely affected with malnutrition are head circumference, midupper arm circumference, triceps skinfold and subscapular skinfold.

    Another measure which may be calculated is body mass index (BMI) which is weight (in kg) divided by height squared (in metres). This is widely used to identify adults with a weight problem, either underweight (BMI less than 18.5), overweight (BMI in range 25-30) or obese (BMI above 30). The measurement of waist circumference provides information on the distribution of body fat and may be a more accurate predictor of health risk when used with BMI.

    Z-scores may be computed for any or all of these measures. The Z-score of a child's 'weight-for-height' (for example) is computed using the child's weight together with the population mean weight and standard deviation for that height derived from a set of reference values. Until recently this was the 1977 National Centre for Health Statistics/World Health Organization (NCHS/WHO) reference which was itself derived from two data sets: one from the NCHS and the other from the Fels Research Institute. These values were intended to reflect the growth rates achieved by children when not constrained by lack of food or disease. They were used for a number of years but were recognised as not being ideal, since they took no account of ethnic diversity in growth rates - if such exist. In 2006 and 2007 the World Health Organization released two sets of child growth standards to replace the NCHS standard.

    We can only infer proportions for Z-scores if they approximate a normal distribution. Height for age in the reference population is approximately normal, but weight for age and weight for height are skewed to the right, because of the presence of obesity. Hence the reference population is divided into two halves at the median, and the standard deviations are calculated for each half. Although this is a bit of a fudge, it does normalize any distribution of Z-values calculated from these values - provided it is similarly skewed. Reporting of nutrition Z-scores should include the following, broken down by sex and age and giving sample sizes: (i) mean values and standard deviations of nutritional indicators. (ii) percentage of children with indicators below different cutoff levels. The recommended Z-score cutoff point to classify malnutrition for each of the three indices is -2, which is the lowest 2.3% of the reference population. Any child falling below -3 is suffering from severe malnutrition.

    The WHO has strongly advocated the use of Z-scores in nutrition studies, although other methods are still used in some countries.

  2. To standardize data for subsequent display and/or analysis.

    If you wish to compare the distributions of two variables that are measured in quite different units, then converting measurements to Z-scores enables them to be displayed on the same axes. There are also a number of analyses you will meet (for example major axis regression, described in Unit 12 and cluster analysis) where you convert measurements to Z-scores prior to analysis.

  3. To eliminate a factor in the analysis by expressing data relative to the mean

    In field studies, it is quite common to find that the response variable you are studying is affected by the one (or more) explanatory variables in addition to the variable you are most interested in. One way to remove the effect of an explanatory variable is to standardize the data to the mean value of each level of that variable.
    For example, we may be interested in the growth rate of a number of fish in relation to their position in the feeding hierarchy. Under laboratory conditions the amount of food provided can be kept constant - but under natural conditions it may vary from month to month and greatly affect the rate of fish growth. If the growth rate of each fish is standardized to the mean growth rate each month, the effect of varying conditions is eliminated.

topics :