InfluentialPoints.com
Biology, images, analysis, design...
 Use/Abuse Stat.Book Beginners Stats & R
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

# Beginners statistics introduction

## Samples, populations and replicates

On this page: Example, with R,  Definition and Use,  Tips and Notes,  Test yourself,  References  Download R  R is Free, very powerful, and does the boring calculations & graphs for scientists.

### Example, with R

To obtain a sample of one student from a population of six students, let them each choose one straw from a group of six straws, all of which appear identical but one is shorter. To replicate that sample, repeat the procedure.

To randomly select an item with

To understand statistical samples, populations and replicates consider this example:

Imagine that you need to estimate how many people in your town (or what proportion thereof) own the house they live in. The population in this case is all the people living in that town.

If that seems trivial, imagine you need to know how many people are infected with tuberculoses, or HIV, or have undiagnosed cancer, or who 'want to expunge the unbelievers'.

1. One way would be to ask everyone - in other words to conduct a 'census' - which is a horribly expensive troublesome undertaking.
Moreover, in practice, every census will miss some people - either because they cannot be found, or do not want to be found, or because the forms got lost...

2. Or you could just ask yourself - after all, you are 'normal', so your situation will represent everyone else's.
If you believe that is true you are either a great political leader, or delusional, or intellectually challenged, or all three.

3. Or you could simply ask someone else - say a friend - or do what journalists do, ask a 'man in the street'.
• One obvious risk to this approach is bias.

The people most conveniently available (friends, or shoppers) may be wholly unrepresentative of everyone who is not so easily available.

The best way to overcome this problem is to randomly select the person who you ask - and the selection must be genuinely random. This will mean listing all the people in the town and (for example) using random number tables to make that selection.

• A second difficulty is your result is based on very little information - one person's, instead of many thousands. In other words, to get a reasonable idea of how everyone else would answer you need to ask a number of randomly-selected individuals - and to ensure their answers are independent. This is known as taking a random sample.

If you do not see why their answers need to be independent, imagine you asked the same questions to people in a bus queue - a moment's thought reveals their answers are liable to be more similar to each other than if asked singularly in private.

• Put another way, if you use random independent selection, the individual results are replicated estimates of the same thing - in this case how many people own their own house.

### Definition and Use

• A statistical sample is a (sub-)set of items which is used to represent the population from which they are drawn. The population is that group about which you wish to make inferences and is assumed to be fixed. In other words, the observations in any one sample represent a single (unchanging) population.
• To avoid bias, items within a sample must be independently and randomly obtained - in which case, items in the sample may be described as replicated observations.
• A further refinement of this simple random sampling approach is one stage cluster sampling. Here clusters of individuals (say families or herds of cattle) are chosen by simple random sampling, and all the individuals in each cluster are questioned or examined.

### Tips and Notes

Confusingly, the word 'census' is often used to describe a sample (for example no bird 'census' has ever counted every bird in an area).

Very very few real samples are even remotely random, and very few results are genuine replicates. Nevertheless the statistical analyses which are applied to them still assume they are random and independent. Hence there is a huge potential for bias. Survey data and monitoring data are most problematical in this respect, experiments may be less so - within certain obvious limits.

No experimenter has ever performed his experiments upon a sample of rats randomly obtained from the rat population as a whole. No clinical trials are performed upon randomly selected people - most use healthy young volunteers, or seriously ill and desperate ones. This is why the effects of many drugs upon the very young and very old are still poorly understood. Remember, the teratogenic effects of thalidomide were missed because no one thought to test the drug upon pregnant rats.

Next time you watch a journalist interviewing people 'on the street' think about how those interviews were selected, and what got left out...

### Test yourself

How well do these results reflect every colour it is possible for you to see?

Hint:
how many colours are in the spectrum, in a painting, or on TV?

### Useful references

Australian Bureau of Statistics (2012). Census and sample. Full text
We do not often quote government websites, but this one provides a remarkably concise overview of the issues.

Freedman, D.A. (2004). Sampling. Encyclopedia of Social Science Research Methods. 3 986-990. Sage Publications Full text
An excellent review which focuses on the many different sources of bias.

Wikipedia: Sample (statistics). Full text
Points out that the best way to get an unbiased sample is to take a random sample. In fact, the ONLY way to get an unbiased sample is to take a random sample!