 InfluentialPoints.com
Biology, images, analysis, design...
 Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)  ### Random and systematic sampling

#### Simple random sampling

You must first be able to list your individual sampling units in some way. This applies whether your sampling unit is a person, a rodent, a tree or an insect. In most cases they will need to be tagged in some way so they can be identified. If you cannot list the individual sampling units, you cannot take a simple random sample (although there are other options as we see below).

The best way to select your sample is to select n numbers from a table of random numbers. Another way is to generate n numbers on the computer using a random number generator. If all else fails, write all the numbers of the sampling units on pieces of paper. Fold the pieces of paper so the numbers are not visible, put them in a box and shake them up. Then select the required number of units, preferably with your eyes shut. Unfortunately it is not random, but hopefully will not be unduly biased.

#### Systematic sampling

For systematic sampling the starting point should be chosen randomly in order to avoid bias. In the diagram right, we wanted to select (n=) 12 units from a population of (N=100), so k = N/n = 100/12 = 81/3. We used random number tables to select a number between 1 and (k=) 8 as our starting point. The number selected was 6, so starting there, we then selected every 8th unit - giving a total sample size of 12. Since N is fairly small, it would have been better had we employed a selection interval of k=81/3 unit - rounding the result to the nearest whole number. Thus, instead of (6+0)=6, (6+8)=12, (6+16)=22, (6+24)=30... we should have used (6+0)=6, (6+81/3)=12, (6+162/3)=23, (6+25)=31...

The need for an initial random selection means that, even for a systematic sample, you must be able to list all units in the population - or at least locate them unambiguously. You also have to know the total number of units in order to select the sampling interval to get your desired sample size. Sometimes the first unit is haphazardly selected, although this can lead to bias - especially if you interpret haphazard to mean convenience and also select a convenient value of k.

If systematic sampling is being used to select quadrats in a field, the distance between plots can measured by the number of paces. The distance between sampling units does not have to be measured too precisely, providing there is no risk of bias in the precise positioning of the sample. If there is, it is better not to look at the ground for the last few paces. ### One stage cluster sampling

This is done in the following way.

1. Decide on an appropriate grouping or cluster. Each cluster should ideally represent the full extent of variation present in the population,
2. Obtain a list of clusters, and take a random sample of clusters.
3. Sample all individuals within each selected cluster.

#### Equally-weighted clusters

We first take an example where there are the same number of secondary units in each cluster.

#### Worked Example

Let us take an example of sampling cages each of 100 laying hens. We take a random sample of 12 cages, and determine the proportion suffering from a particular nutritional disorder in each cage.

 Total number 100 100 100 100 100 100 100 100 100 100 100 100 Number affected 25 37 25 36 41 21 26 14 28 31 35 37 Proportion 0.25 0.37 0.25 0.36 0.41 0.21 0.26 0.14 0.28 0.31 0.35 0.37

Using the formula for one stage cluster sampling with equal numbers per cluster:

 Mean proportion    = 356 =   0.2967  1200 SE( )     = 0.079 =    0.0228  √12

The cluster-corrected 95% normal approximation confidence interval is then given by:

 95% CI(p)  = 0.297 ± 2.20 × SE = 0.247 to 0.347

Had we used the binomial formula assuming a simple random sample we would have got a confidence interval of 0.269 to 0.326. This estimate is smaller than the correct value, and emphasises the importance of ensuring that conditions really are met for use of the binomial formula.

#### Unequally-weighted clusters

In this example there are an unequal number of secondary units in each cluster.

Let's take an example of doing a survey to determine the level of immunization coverage for children against measles in a district. You don't have a list of all the children in the district so you cannot take a simple random sample. But you do have a list of schools in the district. Hence you take a random sample of schools. You then sample all pupils from each school. #### Worked Example

Let's take as an example sampling children of a particular age in schools for their percentage immunity to a disease. The schools are selected randomly and all children of the chosen age at the selected schools are tested for immunity. Not surprisingly the number of children sampled at each school varies widely.

 Total no. children 345 254 124 546 78 345 98 125 256 342 211 322 No. immune 86 94 31 197 32 72 26 19 72 106 74 119 propn. immune 0.25 0.37 0.25 0.36 0.41 0.21 0.27 0.15 0.28 0.31 0.35 0.37

The total number of children sampled is 3046, of whom 928 are immune.
The proportion immune is therefore 0.305
The mean cluster size is 253.83.

Using the formula given above for one stage cluster sampling with unequal numbers per cluster: w   = 928 =   0.3047  3046
sw2 = 0.091 Σ(0.00553 + 0.00427 .... + 0.00686) = 0.0052 SE( w)   = 0.0721 =   0.0208  3.464

The cluster-corrected 95% normal approximation confidence interval is then given by:

 95% CI (p)  = 0.305 ± 2.20 × SE = 0.259 to 0.351 ### Two stage cluster sampling

This is done in the following way.

1. Decide on an appropriate grouping or cluster. Each cluster should ideally represent the full extent of variation present in the population, because you are estimating the variation present in the population from the variation between clusters.
2. Obtain a list of clusters, and either take a random sample of clusters or select clusters by probability proportional to size.
3. Take a random or systematic sample of individuals within each selected cluster.

#### Selection by probability proportional to size

1. List all clusters and their number of units
2. Calculate the cumulative running total
3. Determine the required sample size. In this example we will assume this is eight herds with a desired sample size of 160 animals - so we want to sample 20 animals per herd.
4. Obtain the sampling interval by dividing the total number of animals by the number of herds we wish to sample (8); in this example this is 2384/8 = 298.
5. Choose a random number between 1 and 298 to give the starting point. In this case we used R to obtain 134.
6. Calculate a systematic sampling series by repeatedly adding the sampling interval to the random start viz. 134, 432, 730, 1028, 1326, 1624, 1922, 2220.
7. each of these numbers corresponds to an animal on the herd list; the herds selected are those that contain the series number.

 Herd No. Herd size Cum. Tot. Series no.in cum. tot. 1234567891011121314151617181920 232354844354353792392134435113446462475212134 2325830635038582085794998810091043147815911635169917611808232923502384 134   432 & 730    10281326 1624   1922 & 2220 ### Stratified random sampling

#### Worked Example

Let us take an example of carrying out a survey to determine the prevalence of an allergy in children within a district. The sample was stratified according to rural or urban, as it was anticipated that the rural prevalence rates may be higher. The number of children living in each stratum was known from a recently conducted census. It was decided to use proportional allocation and sample 5% of children in each stratum.

 Stratum No. children Weight (Wi) No. in sample No. with allergy Proportion with allergy (pi) Rural 2460 0.3146 123 40 0.3252 Urban 5360 0.6854 268 76 0.2836 Total 7820 1.00 391 116

Since sample sizes were determined by proportional allocation, we could have used the simplified formula, but we will use the general formulae for demonstration purposes. w   =   (0.3146×0.3252) + (0.6854 × 0.2836) = 0.2967 var( w) = (0.31462×0.3252×0.6748) + (0.68542×0.2836×0.7164)  123 268 = 0.0005327 SE( w) = √var w  =  0.02308

The 95% normal approximation binomial confidence interval to the weighted proportion is then given by:

 95% CI ( w)  = 0.2967 ± 1.96×SE = 0.2515 to 0.3419  Mean no. per unit = (4/3 + 7/5 + 0/1 + 0/1...) 12 = 0.228
 SE( ) = √ (1.333 − 0.228)2 + (1.4 − 0.228)2 + (0 − 0.228)2... (12)(11) = 0.154