InfluentialPoints.com Biology, images, analysis, design... 

"It has long been an axiom of mine that the little things are infinitely the most important" 

Coping with missing dataOn this page: Avoid it Assess the degree of bias Estimate missing values More sophisticated approaches Constant missing responses & extreme case analysisAvoid missing data in the first place
This is the best approach to coping with missing values. The key to avoiding missing observations is good experimental technique. This requires careful experimental design, clear protocols and rigorous checking of all data at the time of collection  as well as thereafter.
Assess the degree of biasThis of commonly done for questionnaire surveys. The principle of the approach is to compare responders and nonresponders for characteristics for which you do have data. For example, you may have independent data on the size of farms, so you can assess whether this factor affects the proportion of farmers responding. If it does not, you then assume that this lack of bias on size extends to the main topic of the questionnaire. This approach is certainly better than nothing, but it is more of an 'act of Faith' than a scientific method of evaluating bias. As well as comparing responders with nonresponders, for which you have very little data, you can also compare early responders with late responders. This has the advantage that you have full data for both of these groups. But it does assume that factor which causes delay in response is the same as that causing nonresponse. This may be true but again this assumption is more an act of faith than science. Another way to detect missing observations is to see if one is getting rather odd distributions. For trap catches, an unusual number of zero observations may lead one to suspect that all is not well. There is a specific way to check for missing studies when doing a systematic search of the literature for a metaanalysis  this is known as a funnel plot. Here the treatment effect recorded in each study is plotted against some measure of study size (usually either the total sample size or the standard error of the treatment effect). If all studies have been found, you should get a greater spread of values for the estimated effect among trials with a small sample size, than among trials with a large sample size. Moreover, the distribution should be symmetrical. A nonsymmetrical distribution indicates that some studies are 'missing'.
Estimate missing values
Sometimes the best option is to try to estimate (or impute) what the value of the missing observation(s) would have been. Critically, all of these methods make certain assumptions. How you estimate the missing value depends on the design of the study: Worked example
The first figure below shows a data set of prevalence values with no missing values. Prevalence increases from January, to reach a peak in May, and then declines to a minimum in November. If there were a missing observation at a time of increase or decrease, then linear interpolation would give a good estimate of the missing value. But if there were a missing value in May, linear interpolation would give a rather poor estimate  as can be seen in the second figure: The only way to get a better estimate would be to fit a function to the data. The third figure shows what we get using a spline fit to the data. This predicts that the prevalence peaks in May at 30%. This is a more reasonable estimate of the missing value.
If you have more than one missing value, you must use an iterative
More sophisticated approaches
Broadly speaking there are two rather more sophisticated multivariate methods to cope with missing values. However, they still generally assume that such values are missing at random, and may require knowledge of covariate values for the missing individual. These methods are therefore not appropriate in many situations. Note that when you use any of these methods, you have not recovered the missing information. All you have done is to make the best, or most unbiased, use of the remaining data. These methods all have one thing in common: your final analysis is only as good as your model, and the remainder of your data. If too much of your data is estimated, your analysis will reflect your assumptions, rather than your data. It is sometimes best to carry out an analysis using several very different methods of coping with the missing data. If the conclusion remains the same, irrespective of the methods used, we can have considerably more confidence in it. This is known as sensitivity analysis. Constant missing responses and extreme case analysisThese are the methods you should use when observations are not missing at random. For example in clinical trials, where withdrawals are caused by side effects of the drugs. In this situation missing responses are commonly assigned a constant missing response  namely as a treatment failure. This reduces the chance of falsely showing a difference between treatments when there is no difference  but it is a conservative approach. In other words you may fail to show a difference between treatments when one really exists. An even more conservative method is extreme case analysis. It is the only way you can be certain that the missing observations are not affecting the outcome. The approach is mainly used where the response variable is measured on the binary or (rarely) the ordinal scale. Missing observations for the group faring better are classed as failures, those for the group faring worse are classed as successes, and the results reanalysed. If the conclusions of the trial are unaffected even by extreme case analysis, we can be reasonably certain that missing values are not biasing the result. A good worked example of extreme case analysis is given in one of the medical
