Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
Coping with missing dataOn this page: Ignore missing data? Missing at random? Recognize missing observations?
Can we ignore missing data?
In any research, you nearly always end up with some missing data. These are observations which are specified in your study design, but for which you are unable to obtain a reading. This may occur when you randomly select a sample for a questionnaire survey - but many of the questionnaires are not returned. Or participants in a clinical trial may drop out of the study before it is complete. Or you may be unable to collect regular monitoring data because all the roads to your field site are flooded.
The commonest response to such missing data is (effectively) to ignore them. This can be done by using complete case analysis in which you only analyse cases for which there are complete data. Cases without full data on all explanatory variables are deleted - this is known as casewise deletion. Another approach is pairwise deletion in which cases are only deleted if the analysis is being carried out on the particular explanatory variable that is missing. If the missing observation is part of a matched group (for example, in veterinary trials animals in the different experimental groups are often matched by age), then the matched observations in the other groups are deleted (matched deletion).
However, as a rule, it is very unwise to just ignore missing observations - for the following reasons:
Is it missing at random?
If an observation is missing at random, it means that its absence is independent of the outcome being
Consider a study on the incidence of a notifiable disease. Farmers are asked to complete a questionnaire on the number of cases they have had on their farm over the past five years. Five hundred farms are selected at random for the study. But responses are received from only two hundred and twenty of the farms. If the answers of the responders were similar to the answers of the non-responders, there would be no problem (apart from the smaller sample size). But what if the reason some of the farmers did not respond was precisely because the disease had occurred on their farm - and they had not reported it to the Ministry? In this situation the absence of a response is not independent of the outcome, resulting in non-response bias.
Now consider a clinical trial to compare two nasal spray preparations for asthma. During the trial one of the
If we forget about those who had side effects to the spray, we get a cure rate of 73.3% for spray A compared to only 54.1% for spray B. But, because these withdrawals were not missing at random, this is an example of participation bias. It would be quite misleading to conclude that 73.3% of all patients receiving treatment A would be cured if many of them refused to take the treatment! In this case we can correct our analysis by including the withdrawals as treatment failures. If we do this, we find that there is little or no difference between treatment B and treatment A.
Lastly, consider a long term study to assess the effect of climatic factors on mortality rates of an insect pest. Population parameters are monitored using mark-release-recapture carried out every month over a year, except during the months of April and May - when access is difficult because of flooding. If we just use the data from the remaining ten months, we have to assume that the relationship is the same in the two missing months. But the reason the data were missing was directly related to a factor that may affect the mortality rate - namely rainfall. So the absence of data may not be independent of the
But before you consider these methods in detail, we need to ask one more question...
Do you always recognize missing observations?
There is only one thing worse than having missing data - and that is having missing data, but not knowing about it! You might think this is very unlikely. But there are many instances where it is a serious problem.
Cryptic missing data such as these can give very misleading results - and should be avoided at all costs. Unnoticed cryptic missing data contribute to an insidious and intractable form of variation and bias - sometimes referred to as 'measurement error'. This includes degraded, or 'partly' missing observations - such as where no one records when foraging ants remove most of the flies from your sampling traps!
Related topics :