Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
What is a Z-score?
A Z-score (or standard score) represents how many standard deviations a given measurement deviates from the mean. In other words it merely re-scales, or standardizes, your data. A Z-score serves to specify the precise location of each observation within a distribution. The sign of the Z-score (+ or − ) indicates whether the score is above (+) or below ( − ) the mean.
A Z-score is calculated by subtracting the mean value from the value of the observation, and dividing by the standard deviation. Commonly a known reference population mean and standard deviation are used.
Alternatively, Z-scores can be calculated using the sample mean and the sample standard deviation:
The latter formulation is most commonly used when all the measurements in a sample are transformed to Z-scores to give a Z-score distribution.
Characteristics of a Z-score distribution
If your Z-score distribution is based on the sample mean and sample standard deviation, then the mean and standard deviation of the Z-score distribution will equal zero and one respectively. If your Z-score distribution is based on the population mean and population standard deviation, then the mean and the standard deviation of the Z-score distribution will only approximate to zero and one if the sample is random.
The shape of a Z-score distribution will be identical to the original distribution of the raw measurements. If the original distribution is normal, then the Z-score distribution will be normal, and you will be dealing with a standard normal distribution. You can then make assumptions about the proportion of observations below or above specific Z-values. If however, the original distribution is skewed, then the Z-score distribution will also be skewed. In other words converting data to Z-scores does not normalize the distribution of that
In some applications (such as weight-for-age in nutritional studies), the Z-scores are not based upon the known population mean and standard deviation, but on an external reference population. In this situation the Z-scores are used to identify those individuals in the sample falling below a specified Z-score. Sometimes the distribution of the whole sample is examined, in which case the Z-scores will not have a mean of zero and a standard deviation of one - what is of interest is the extent to which their distribution differs from the reference population.
Sometimes Z-scores are themselves transformed to avoid negatives values. For example, the T-score has a mean of 50 and a standard deviation of 10. You can transform Z-scores to T-scores by multiplying each Z-score by 10 and adding 50.
Another commonly used transformed score is the (so called) intelligence quotient (IQ) score. This has a mean of 100 and (usually) a standard deviation of 15.
Uses of Z-scores
This is the commonest use of Z-scores. Converting a measurement to a Z-score indicates how far from the mean the observation lies in units of standard deviations. If the population distribution approximates to a normal distribution, then we can also estimate the proportion of the population falling above or below a particular value.
This application has been most developed in studies of nutritional state of children. The basic data are age, sex, weight and height. The three preferred indices are:
Another measure which may be calculated is body mass index (BMI) which is weight (in kg) divided by height squared (in metres). This is widely used to identify adults with a weight problem, either underweight (BMI less than 18.5), overweight (BMI in range 25-30) or obese (BMI above 30). The measurement of waist circumference provides information on the distribution of body fat and may be a more accurate predictor of health risk when used with BMI.
Z-scores may be computed for any or all of these measures. The Z-score of a child's 'weight-for-height' (for example) is computed using the child's weight together with the population mean weight and standard deviation for that height derived from a set of reference values. Until recently this was the 1977 National Centre for Health Statistics/World Health Organization (NCHS/WHO) reference which was itself derived from two data sets: one from the NCHS and the other from the Fels Research
We can only infer proportions for Z-scores if they approximate a normal distribution. Height for age in the reference population is approximately normal, but weight for age and weight for height are skewed to the right, because of the presence of obesity. Hence the reference population is divided into two halves at the median, and the standard deviations are calculated for each half. Although this is a bit of a fudge, it does normalize any distribution of Z-values calculated from these values - provided it is similarly skewed.
Reporting of nutrition Z-scores should include the following, broken down by sex and age and giving sample sizes: (i) mean values and standard deviations of nutritional indicators. (ii) percentage of children with indicators below different cutoff levels. The recommended Z-score cutoff point to classify malnutrition for each of the three indices is
If you wish to compare the distributions of two variables that are measured in quite different units, then converting measurements to Z-scores enables them to be displayed on the same axes. There are also a number of analyses you will meet (for example major axis regression, described in
In field studies, it is quite common to find that the response variable you are studying is affected by the one (or more) explanatory variables in addition to the variable you are most interested in. One way to remove the effect of an explanatory variable is to standardize the data to the mean value of each level of that variable.