InfluentialPoints.com
Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

 

 

Random and systematic sampling

Simple random sampling

You must first be able to list your individual sampling units in some way. This applies whether your sampling unit is a person, a rodent, a tree or an insect. In most cases they will need to be tagged in some way so they can be identified. If you cannot list the individual sampling units, you cannot take a simple random sample (although there are other options as we see below).

The best way to select your sample is to select n numbers from a table of random numbers. Another way is to generate n numbers on the computer using a random number generator. If all else fails, write all the numbers of the sampling units on pieces of paper. Fold the pieces of paper so the numbers are not visible, put them in a box and shake them up. Then select the required number of units, preferably with your eyes shut. Unfortunately it is not random, but hopefully will not be unduly biased.

Systematic sampling

For systematic sampling the starting point should be chosen randomly in order to avoid bias. In the diagram right, we wanted to select (n=) 12 units from a population of (N=100), so k = N/n = 100/12 = 81/3. We used random number tables to select a number between 1 and (k=) 8 as our starting point. The number selected was 6, so starting there, we then selected every 8th unit - giving a total sample size of 12.

Since N is fairly small, it would have been better had we employed a selection interval of k=81/3 unit - rounding the result to the nearest whole number. Thus, instead of (6+0)=6, (6+8)=12, (6+16)=22, (6+24)=30... we should have used (6+0)=6, (6+81/3)=12, (6+162/3)=23, (6+25)=31...

The need for an initial random selection means that, even for a systematic sample, you must be able to list all units in the population - or at least locate them unambiguously. You also have to know the total number of units in order to select the sampling interval to get your desired sample size. Sometimes the first unit is haphazardly selected, although this can lead to bias - especially if you interpret haphazard to mean convenience and also select a convenient value of k.

If systematic sampling is being used to select quadrats in a field, the distance between plots can measured by the number of paces. The distance between sampling units does not have to be measured too precisely, providing there is no risk of bias in the precise positioning of the sample. If there is, it is better not to look at the ground for the last few paces.

 

 

One stage cluster sampling

This is done in the following way.

  1. Decide on an appropriate grouping or cluster. Each cluster should ideally represent the full extent of variation present in the population,
  2. Obtain a list of clusters, and take a random sample of clusters.
  3. Sample all individuals within each selected cluster.

Equally-weighted clusters

We first take an example where there are the same number of secondary units in each cluster.

Worked Example

Let us take an example of sampling cages each of 100 laying hens. We take a random sample of 12 cages, and determine the proportion suffering from a particular nutritional disorder in each cage.

Total number100100100100100100100100100100100100
Number affected253725364121261428313537
Proportion0.250.370.250.360.410.210.260.140.280.310.350.37

Using the formula for one stage cluster sampling with equal numbers per cluster:
Mean proportion    =    356    =   0.2967      
1200
SE()     =    0.079     =    0.0228
√12

The cluster-corrected 95% normal approximation confidence interval is then given by:

95% CI(p)  =   

0.297 ± 2.20 × SE   =    0.247 to 0.347

Had we used the binomial formula assuming a simple random sample we would have got a confidence interval of 0.269 to 0.326. This estimate is smaller than the correct value, and emphasises the importance of ensuring that conditions really are met for use of the binomial formula.

 

Unequally-weighted clusters

In this example there are an unequal number of secondary units in each cluster.

Let's take an example of doing a survey to determine the level of immunization coverage for children against measles in a district. You don't have a list of all the children in the district so you cannot take a simple random sample. But you do have a list of schools in the district. Hence you take a random sample of schools. You then sample all pupils from each school.

 

Worked Example

Let's take as an example sampling children of a particular age in schools for their percentage immunity to a disease. The schools are selected randomly and all children of the chosen age at the selected schools are tested for immunity. Not surprisingly the number of children sampled at each school varies widely.
Total no. children3452541245467834598125256342211322
No. immune869431197327226197210674119
propn. immune0.250.370.250.360.410.210.270.150.280.310.350.37

The total number of children sampled is 3046, of whom 928 are immune.
The proportion immune is therefore 0.305
The mean cluster size is 253.83.

Using the formula given above for one stage cluster sampling with unequal numbers per cluster:
w   =    928    =   0.3047
3046

sw2 = 0.091 Σ(0.00553 + 0.00427 .... + 0.00686) = 0.0052
SE(w)   =    0.0721    =   0.0208
3.464

The cluster-corrected 95% normal approximation confidence interval is then given by:

95% CI (p)  =   

0.305 ± 2.20 × SE   =    0.259 to 0.351

 

 

Two stage cluster sampling

This is done in the following way.

  1. Decide on an appropriate grouping or cluster. Each cluster should ideally represent the full extent of variation present in the population, because you are estimating the variation present in the population from the variation between clusters.
  2. Obtain a list of clusters, and either take a random sample of clusters or select clusters by probability proportional to size.
  3. Take a random or systematic sample of individuals within each selected cluster.

Herd
No.
Herd
size
Cum.
Tot.
Series no.
in cum. tot.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
23
235
48
44
35
435
37
92
39
21
34
435
113
44
64
62
47
521
21
34
23
258
306
350
385
820
857
949
988
1009
1043
1478
1591
1635
1699
1761
1808
2329
2350
2384
 
134
 
 
 
432 & 730
 
 
 
 
1028
1326
 
1624
 
 
 
1922 & 2220
 
 

Selection by probability proportional to size

  1. List all clusters and their number of units
  2. Calculate the cumulative running total
  3. Determine the required sample size. In this example we will assume this is eight herds with a desired sample size of 160 animals - so we want to sample 20 animals per herd.
  4. Obtain the sampling interval by dividing the total number of animals by the number of herds we wish to sample (8); in this example this is 2384/8 = 298.
  5. Choose a random number between 1 and 298 to give the starting point. In this case we used R to obtain 134.
  6. Calculate a systematic sampling series by repeatedly adding the sampling interval to the random start viz. 134, 432, 730, 1028, 1326, 1624, 1922, 2220.
  7. each of these numbers corresponds to an animal on the herd list; the herds selected are those that contain the series number.

 

 

Stratified random sampling

Worked Example

Let us take an example of carrying out a survey to determine the prevalence of an allergy in children within a district. The sample was stratified according to rural or urban, as it was anticipated that the rural prevalence rates may be higher. The number of children living in each stratum was known from a recently conducted census. It was decided to use proportional allocation and sample 5% of children in each stratum.
StratumNo. childrenWeight (Wi) No. in sampleNo. with allergy Proportion with
allergy (pi)
Rural24600.3146 123400.3252
Urban53600.6854 268760.2836
Total78201.00 391116 

Since sample sizes were determined by proportional allocation, we could have used the simplified formula, but we will use the general formulae for demonstration purposes.
w   =   (0.3146×0.3252) + (0.6854 × 0.2836) = 0.2967

var(w)   =   (0.31462×0.3252×0.6748) + (0.68542×0.2836×0.7164)
123 268
   =  0.0005327   
 
SE(w)   =   √varw  =  0.02308

The 95% normal approximation binomial confidence interval to the weighted proportion is then given by:

95% CI (w)  =   

0.2967 ± 1.96×SE   =    0.2515 to 0.3419

 

 

Adaptive cluster sampling

Worked Example

We will take the example given in the core text where 12 initial samples were taken. Two samples were positive so adjacent units were also sampled, giving the networks shown in the figure:

  desad3.gif
 
Mean no. per unit  =  (4/3 + 7/5 + 0/1 + 0/1...)
12
   =     0.228
 
SE() =
 (1.333 − 0.228)2 + (1.4 − 0.228)2 + (0 − 0.228)2...
(12)(11)
  =    0.154

Since the initial sample constituted more than 5% of the population, we multiply this by √[1 − 12/100] to give a corrected standard error of 0.144.