Sampling methodology
The source population has to be (at least partially) defined by specified criteria so that appropriate controls can be selected. Such criteria will often be geographic, although they need not necessarily be so. Cases must be rigorously defined to avoid misspecification bias.
Selection of cases is often straight forward since usually all cases from the source population over a defined period are included. If all cases are not selected, then a random sample of cases must be taken. If cases are only selected from hospital, there may still be bias. Some people (for example those able to afford medical insurance) are more likely to be hospitalized than other people for the same disease. It is preferable to use new (incident) cases because previously diagnosed cases represent long term survivors.
Selection of controls is probably the most important (and most difficult) part of a casecontrol study. In general, controls should be representative of those in the source population at risk of becoming a case. The way in which controls are obtained depends on how well defined the source population is.
 Population controls
If the source population is fully defined (in other words all units in the population can be listed), a random sample of controls can be obtained from that population. The study can then be properly described as populationbased. This improves the validity of conclusions, and decreases the likelihood of selection bias. Usually a listing is only available in special circumstances, such as a casecontrol study nested in a cohort study. More often various pseudorandom methods are used to obtain the sample including systematic and haphazard sampling and (for human studies) random digit telephone dialling  all of which are subject to bias. If matching is carried out (see below), controls only need to be representative (that is randomly sampled) within strata  for example within each age group. If a density casecontrol design is being used, then the selection probability for each control should be proportional to the individual's persontime at risk  a technique known as density sampling. A major problem with population controls is participation bias. It is not unusual to get a participation rate of less than 50%  and nonparticipation may be correlated with socioeconomic status and/or education level.
Neighbourhood controls
Where the source population is not fully defined, controls can be obtained from the vicinity of the place of residence of the case. They are commonly used as matched controls. Selection bias should be avoided by using an element of random or systematic sampling in the final selection  for example visiting neighbours along a street in a set order, or at predetermined distances and angles from the case's home. Ecologists looking at the reasons for choice of particular nest sites (or prey kill sites) often use neighbourhood controls selected in some predetermined manner from the vicinity of the chosen nest site. In human studies there is still a problem with low participation rates using neighbourhood controls.
Hospital controls
These are patients suffering from other diseases selected at random in the same or neighbouring hospitals. They are easier to obtain than population controls and often more cooperative  leading to much higher participation rates. Hospital controls may either be matched to cases by some characteristic or selected at random from the hospital 'population'. There are several possible sources of bias in using hospital controls.
 The risk factor for the disease being studied may also be a risk factor for one of the diseases which controls are suffering from. This will result in underestimation of the importance of the risk factor. For example, smoking is a risk factor for a wide range of diseases, so case control studies using hospital controls are inappropriate for identifying smoking as a risk factor.
 Patients suffering from two diseases at the same time may be more likely to be hospitalized than patients suffering from either one. This leads to a spurious association between the two and is known as Berkson's bias
Friend, associate or relative controls
Here cases are asked to name friends, associates or relatives who could act as matched controls in the study. There is clearly a risk of selection bias here  for example, there is evidence that cases tend to identify friends who are better educated. On the other hand the level of participation bias is much reduced.
We then have to consider the optimal number of controls. Probably the commonest approach is to just have one control per case. This is optimal if one has a sufficient number of cases. But if there are very few cases available, then it is better to increase the number of controls up to a maximum of about 4 controls per case.
There are big advantages to using two different types of control groups, even though this increases the work involved, and hence the cost of the study. If both control groups give the same sort of answers, then the credibility of the results is strengthened. Some authors have argued against the use of two control groups on the basis that one does not know which result to ignore if one gets disparate results. This argument seems to be rather along the lines of 'ignorance is bliss'. It is far better to know if there is a possible problem of selection bias in one of the groups so it can be further investigated.
Once cases and controls have been selected, then their exposure to the suspected risk factor(s) must be assessed. This is done from past records or by questionnaire. If at all possible, the same method should be used for cases and controls.
Analytical methods
 Unmatched (and frequency matched) studies
Since we are taking separate samples of cases and controls, we cannot estimate prevalences and hence cannot directly estimate the risk ratio. However, we can estimate the odds ratio as explained in Unit 1. Exactly what this odds ratio approximates to depends on which particular variant of the design is being used. For a cumulative casecontrol design, the odds ratio will only approximate to the risk ratio if the condition is rare (low incidence). Otherwise it will overestimate the risk ratio. For a density casecontrol design, the odds ratio will approximate to the risk ratio irrespective of whether the condition is rare or not.
For an unmatched casecontrol study, continuous explanatory variables can be compared using the parametric twosample ttest or the nonparametric WilcoxonMannWhitney test.
The significance of an association between a risk factor and case status can be tested using Pearson's chi square test , Fisher's exact test (but not recommended), or by attaching a confidence interval to the odds ratio. Use of MantelHaenszel methods to deal with multiple 2×2 tables are dealt with in . Modelling approaches for casecontrol designs using logistic regression are covered in . When cases and controls are frequency matched, Szklo & Nieto (2004) suggest that the most efficient strategy is to use ordinary logistic regression and include the matching variables in the model.

Matching must be taken account of at the analysis stage since cases and controls are no longer being sampled independently. Consequently the data are arranged in a contingency table as the number of study pairs:
Case exposed to risk factor  Control exposed to risk factor 
 Yes  No 
Yes  c_{1}  d_{1} 
No  d_{2}  c_{2} 
 
where
 c_{1} and c_{2} are concordant pairs of case and control (with the same exposure, either positive or negative)
 d_{1} and d_{2} are discordant pairs of case and control (with different exposure)
The odds ratio is then given by the number of discordant cases where case is exposed and control is not exposed (d_{1}), divided by the number of discordant cases where case is not exposed and control is exposed (d_{2}).
Algebraically speaking 
Odds ratio (ω) = d_{1}/d_{2} 
where
 d_{1} and d_{2} are defined as above.

For a matched casecontrol study with 1:1 matching, continuous explanatory variables can be compared using a parametric paired ttest or the nonparametric Wilcoxon matchedpairs signedranks test. The significance of the association between a (categorical) risk factor and case status can be tested using McNemar's test, or by attaching a confidence interval to the odds ratio. Analysis for more than one control matched to each case can also be done using MantelHaenszel methods. Modelling for individually matched casecontrol designs should be done with conditional logistic regression.