Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

Search this site




    In an experimental study, the experimenter manipulates or controls the level of the explanatory variable(s). This is in contrast to observational studies where the level of the explanatory variable is either self-selected by the unit concerned or has been imposed haphazardly. In an experiment, the two (or more) levels of the explanatory variable(s) are randomly allocated as treatments usually to a number of independent replicated experimental units. This is again in contrast to an observational study where there is no random allocation of treatments to the (sampling) units. In medical research such experiments are termed randomized controlled trials. Strictly speaking, the term 'controlled' refers to the allocation of treatment to individuals being under the control of the experimenter, although it can also be taken to refer to the presence of a control group for comparison with the treated group. Most randomized experiments are parallel trials - in other words treatments are allocated to two (or more) parallel groups of experimental units, and treatment remains the same throughout the course of the experiment.

    The term experiment is sometimes used where there is manipulation, but no random allocation. However, the element of random allocation is extremely important in biological experiments because (providing treatment group sizes are sufficiently large) it ensures that groups are balanced as regards potentially confounding variables. It also helps to eliminate bias (whether conscious or unconscious) in which treatment is allocated to which unit, and serves to validate the subsequent statistical comparison of the treatment groups. This is why we can have more confidence that a strong relationship demonstrated in an experiment indicates a causal link - in other words, there is a fairly strong inference for causality. We therefore reserve the term experiment for where we have manipulation, randomization and replication. The term quasi-experiment can be used for where there is manipulation, but no random allocation.

    If only a small number of experimental units are available (for example, plots of land in agricultural trials), then it can no longer be assumed that treatment groups will be balanced as regards potentially confounding variables. Most plots allocated to one treatment level may (by chance) fall in a more fertile area, whilst plots allocated to another treatment level fall in less fertile areas. The same applies if experimental units are cattle which vary in age - treated cattle may by chance be the older cattle, whilst untreated are younger cattle. In this situation it is important that there is adequate interspersion of treatments through the process of stratification - in other words that treatments are assigned randomly within particular strata or blocks of relatively homogeneous units.


    We have noted above the need for replication of each level of the explanatory variable. This is certainly the case if the explanatory variable is a nominal variable. But there is an exception to this general rule if the explanatory variable is an ordinal or (especially) a measurement variable. In this situation, a regression design may be appropriate in which (many) different levels of the explanatory variable are randomly allocated to individual experimental units. Such an experiment would then be best analyzed using a regression model rather than the more commonly used analysis of variance. In this case only one replicate would be required for each level, although in practice it is more common to use a replicated regression design with n replicates for each level. Choice of appropriate levels for a regression design depends on the form of the relationship between the response and explanatory variables. If the response is multiplicative rather than linear, values of the explanatory variable should be evenly spaced on a logarithmic rather than arithmetic scale.

    We also noted above that the replicates in an experiment should be independent. The principle of independent replication is extremely important and applies to both observational designs and randomized experiments. The issue is also controversial because it can be very difficult to obtain independent replication. Hence we devote a separate More Information page to the topic of pseudoreplication, and concentrate here on the issues of randomization and stratification. Other important issues such as allocation concealment and blinding are dealt with in related topics.



Global randomization (completely randomized designs)

    In global randomization each unit has an equal probability of receiving any treatment, at least at the start of the randomization process. In other words, treatments are allocated to units irrespective of any differences between them. This applies whether those differences be in age or sex (in clinical trials) or in position in a field (in agricultural trials).

    If you are to use a completely randomized design, you must assume either:

    1. the treatment factor is the only factor that has a significant effect on the variable you are measuring, or
    2. you have a sufficient number of experimental units, such that the treatment groups will be effectively balanced for potentially confounding factors by the process of randomization.

    In laboratory experiments it is sometimes possible to control all potentially confounding factors, so one can use global randomization even with only moderate numbers in each treatment group. However, outside the laboratory, large (or very large) numbers of experimental units must be used in each treatment group to balance confounding factors. Hence the use of large numbers of participants in most clinical trials.


    A further complication is that there are two types of global randomization.

    1. In simple global randomization, no restriction is placed on treatment group sizes which may vary anywhere from zero to N. As a result, the probability of receiving a particular treatment is independent of how many treatments have already been allocated and remains the same until randomization is complete.
    2. In restricted global randomization, the probability that each unit has of receiving any treatment is the same at the start of randomization, but that probability changes as units are allocated to each treatment. This is because it is conditional upon how many units have been allocated to each treatment because, one way or another, the number in each treatment group is constrained. Restricted global randomization retains the advantages of simple random allocation, as regards balancing confounding factors, but ensures similar (or identical) numbers of units in each treatment group.

    Variation in treatment group size does not matter too much if your groups are very large. But for small or medium group sizes, variation in group sizes can serious reduce the power of a study. Hence the widespread use of restricted randomization. For many types of experiment the total number of units is known in advance, so it is straightforward to assign treatments to identical numbers of units. In clinical trials (where participants are recruited over a period of time) other methods are used. One such is random permuted blocks where randomizations are blocked over time. As well as tending to minimize size discrepancies between groups, this procedure also protects against unknown time trends in the characteristics of arriving patients. Another method is the biased coin method where at each allocation the probability of assigning to the smaller group is made greater than 0.5.

    Do not confuse the various methods used to make treatment group sizes similar (such as random permuted blocks), with blocking in (for example) agricultural trials which is equivalent to stratification. The former is not usually taken account of in the analysis whilst the latter is.

    Strengths and weaknesses of global randomization

    • Simple global randomization is always valid and makes no assumptions other than that allocation is truly random. However, if you have only a few experimental units, the number of units in each treatment group may differ considerably, which will weaken the power of statistical tests. Restricted global randomization will give similar group sizes but some of the methods can distort the significance level of subsequent statistical tests.
    • Balancing of possible confounding factors can only be relied upon if there are many experimental units. If the number of units is small, treatment groups may differ considerably in such factors, for example age composition in clinical trials or soil types in agricultural trials.
    • One alternative to having a very large sample size is to attempt to make all units as similar as possible by using strict exclusion criteria - for example only using animals of the same age, sex and weight. The disadvantage of this is that your results will only be applicable to animals of that particular age, sex and weight.



Nested designs

    Up till now we have assumed that one observation is made on each experimental unit - in other words the evaluation unit is the same as the experimental unit. But sometimes multiple evaluation units may be used to assess each experimental unit. When multiple observations are made on a single experimental unit, it is termed a nested design. The key feature of a nested factor is that levels of the nested factor are unique to each level of the higher level factor. So if treatments are randomly allocated to herds, but observations are made on individual cows, each cow is unique to one herd and each herd is unique to one treatment.

    Nearly all experiments (other than regression designs) have one level of nesting (replicates are nested in treatment), but by convention it is only termed a nested design if there are at least two levels of nesting. Nested designs may result in pseudoreplication if the evaluation units are wrongly treated as experimental units in the analysis. Similar errors of analysis can also occur in partially nested designs which we consider below after looking at blocked and factorial designs.



Stratified randomization (blocked designs)

    In stratified randomization the randomization process is restricted by grouping the experimental units into more or less homogenous strata before the process of random allocation. This balances the contribution of such factors to each treatment group, and is often essential if there are only a small number of experimental units available. Designs using stratified randomization include the randomized block and Latin square designs. We deal with clinical trials separately below because the terminology used varies from that in other disciplines.

Stratified clinical trials

    Strata are formed of patients with similar characteristics. For clinical trials age and gender are commonly used factors for stratification. If one has two categories for each, one would then have a total of four strata. The number of patients within each stratum may vary widely. A separate randomization list is drawn up for each stratum of patients, thus balancing the composition of such characteristics in each treatment group.

    Stratification in clinical trials should only be used if the researcher is reasonably certain that the factor will affect the value of the response variable. In fact there is still some dispute in medical journals about the need or desirability of stratification. Certainly if the trial is large and random allocation is properly carried out, stratification may be an unnecessary complication. The stratification should always be explicitly recognised in the analysis.

    Strengths and weaknesses of stratification in clinical trials

    • Providing there is significant heterogeneity, stratification produce treatment groups balanced for known confounding factors and will increase precision in estimating the treatment effect.
    • This benefit diminishes as sample size increases, and with very large sample sizes (say over 300 participants), the completely randomized design is more efficient.
    • There is a limit to the number of factors that one can stratify for - for three factors each with three levels one is already up to (3 3 3) = 27 strata. If the number of patients is small, some of the strata may have no patients.
    • Stratification is counterproductive if one is not sure about the identity of confounding factors, or if one does not have the organizational capacity for stratified randomization.
    • Analysis is more complex than for the randomized complete block design because there are variable numbers of patients within each stratum. Having empty strata causes especial problems in the analysis.

    The disadvantages of stratified randomization - especially in relation to dealing with large numbers of factors with small numbers of experimental units - can be overcome by using a different approach known as minimization. Treatment allocation is only done randomly for the first unit; after that it depends on the characteristics of those units already allocated. This equalizes the category totals between treatment groups rather than doing it for each individual stratum. The process ensures balance between groups for several factors, even in small samples - but is still not very popular amongst statisticians despite its strong advocacy by some. It is important to take account of the factors used for minimizing in the analysis - otherwise P-values will be invalid.

The randomized block design

    Stratification has always been used extensively in the agricultural context where the experimental units are plots of land, and the strata are described as blocks. In the classical randomized complete block design, plots that are similar are grouped together into blocks each with the same number of plots as the number of treatment levels. Treatments are then assigned at random within blocks with the restriction that each treatment occurs only once within each block. Because there is only one replicate per block, one has to assume that there is no interaction between the blocking factor and the treatment. Use of multiple replicates per block allows one to test whether there is any interaction between treatment and blocking factor - and is a much stronger design.

    The simplest form of the randomized block design is the matched pairs design. Pairs of units are matched for some characteristic (so the pairs are equivalent to blocks), and one or other treatment level is randomly allocated to each member of the pair. Because there are only two units in each block, the design can only deal with two levels of the treatment factor.

    Matched pairs are commonly used in cluster randomized trials where groups of individuals (for example schools or villages) are randomized to receive different public health interventions. Clusters are often matched by geographic area or size of the cluster. Providing the number of clusters is moderate or there is a very high correlation between the matching factor and the response variable, matching will increase the power of the study. But for low correlation or very small numbers of clusters, matching can actually reduce power because of loss of degrees of freedom. The moral of the story is that matching may be appropriate, but this should not just be assumed.

    Strengths and weaknesses of the randomized block design

    • If there is significant heterogeneity amongst units, blocking will increase precision in estimating the treatment effect by removing one source of variation from experimental error. If there is no significant heterogeneity the completely randomized design is more efficient.
    • For matched pairs the number of treatment levels is limited to two. Otherwise there is no limit on the number of treatment or the number of blocks, although the efficiency of the design decreases as the number of treatment levels (and hence size of the blocks) increases.
    • One replicate per block should only be used if there is no interaction between the treatment and blocking factor. If there is interaction, there should be several replicates of each treatment in each block.
    • Estimation of missing observations can be problematic if there is more than one such observation.


The Latin square design

    Whilst the randomized block design can only deal with one systematic source of variation, the Latin square design can deal with two such sources. It was first used widely in agricultural experiments where for example the study site may have a fertility gradient running East-West, and a moisture gradient running North-South. In a Latin Square design we divide up the field into 'rows' and 'columns. Each row and each column then contains one replicate of each treatment level. Hence the number of rows, the number of columns, and the number of treatment levels are all the same.

    The Latin square design is what is known as a confounded design because the main effects of the treatment and blocking factors are confounded with the interactions between factors. Hence in order to analyse a single Latin square one has to assume that there are no interactions between any of the factors.

    Strengths and weaknesses of the Latin square design

    • The design has the advantage that it allows one to control two sources of variation
    • It is a very efficient design providing a great deal of information for the minimum time and effort - providing the assumptions are met.
    • It is a restrictive design in that the number of treatments, number of columns and number of rows must all be the same. In practice this limits the number of treatments to a maximum of about 10.
    • The design is not appropriate if there is any interaction between the various factors.
    • The design is not resilient to errors or missing data points. Although a single missing point can be estimated, more than one missing point will make it very difficult to demonstrate any significant differences.




Factorial designs

    Up till now any interaction between treatment and blocking factors has been considered a problem - and we have often just assumed that such interaction is not present. But of course in reality any treatment will be applied along with all other factors. Hence it makes sense to investigate how different factors interact.

    In a complete factorial design combinations of the various levels of two or more treatment factors are randomly allocated to each unit. With two treatment factors each at two levels there will be 4 combinations; with two treatment factors each at three levels there will be 8 combinations. Since each level of one factor is present with each level of the other factor we can refer to factor A being fully crossed with (or orthogonal to) factor B. Each combination must be replicated at least twice if one is to assess the interaction between the two factors. The combinations are then used in one of the designs already considered such as the completely randomized or randomized block design.

    Strengths and weaknesses of factorial designs

    • The main advantage is that all interactions between factors can be investigated and quantified with equal precision. If interaction is present, the main effects of the factors in isolation are not relevant since their effect will depend on the level of the other factor.
    • If there is no significant interaction, then each main effect is estimated with the same precision as if the whole experiment had been devoted to that factor alone. Two single factor experiments would together require twice the number of units to attain the same precision as the factorial. In other words, the design is very efficient.
    • The design rapidly becomes unwieldy with a large number of factors. With six factors at two levels there are 64 combinations in the full set of factorials.



Partially nested designs

Split-plot designs

    It is often not possible to randomly allocate experimental units to treatment combinations. For example in agricultural experiments some treatments, such as irrigation or ploughing, can only feasibly be done over a large area. One way to alleviate this problem is to use a split-plot design where one has blocks (originally termed mainplots), which are then divided into plots (originally termed split-plots). Different levels of treatment factor A (for example irrigated or not) are first randomly allocated to the blocks. Different levels of treatment factor B (for example insecticide application) are then randomly allocated within each block to the different plots.

    In this design the blocks must be regarded as a factor which is nested in factor A. However, factor A is still crossed with factor B - hence it is a partially nested design. The split-plot design is a good example of where we have randomization at two levels: first to the main plots, then to the subplots.

Strengths and weaknesses of split-plot designs

  • One advantage of the design is that it enables the use of treatments that can only feasibly be done over a large area.
  • It is also invaluable where there is a shortage of experimental units for one of the factors, for example controlled environment rooms. If such units are made the blocks, then far fewer rooms are required than would be the case for a fully replicated factorial design.
  • A further strength is that treatment factors can be added to an experiment already underway by splitting existing units into smaller units and randomly allocating levels of the new factor to the new units.
  • The advantages are bought at the cost of assumptions about the absence of interactions, specifically between the block and factor B. There is also often a problem of lack of power for testing the effect of factor A - although this simply relates to the small number of blocks which may not be under the control of researcher.


Repeated measures designs

    A repeated measures design is one where repeated measurements are made on the same experimental unit (either an individual or a plot) over time. This design can be very useful to eliminate the variability between different units. But it can have major limitations depending on the type of repeated measures design.

    In one type of repeated measures designs units are randomly allocated different treatment levels at the start of the experiment, and then continue to receive the same treatment level throughout the time period. Time itself is a factor in the design and the order of this factor obviously cannot be randomized . This is sometimes termed a subjects trial design design. Since the observations over time are all on the same the same unit, they cannot be treated as independent replicates. We can describe the subjects as being nested within treatments. This is analogous to the split-plot design where the blocks (= subjects) are nested in treatments. There are of course differences between the split plot and subjects trial designs, but they are nevertheless analyzed using the same analysis of variance as we will see in Unit 13.


    The other type of repeated measures design is very different in that treatments are randomly allocated within units rather than between units. If only two treatments are to be compared, then it is termed a crossover design. Experimental units are randomly assigned to one or other of two sequence groups. Units in sequence group I receive treatment A1 followed by A2. Units in sequence group II receive treatment A2 followed by A1. If treatments are repeatedly alternated, it may be described as a multiple crossover design.

    A crossover design comprising three or more treatments is best done using a multiperiod Latin square design (also known as a round-robin design). In recent years it has been used widely for testing of trap designs and odour attractants for insects. The main sources of variation are trap site, day and trap/odour type. Within a square, one must have the same number of sites, days and trap types. The different traps or odour attractants are rotated round to their next designated positions at the end of each day.

    Strengths and weaknesses of within-unit repeated measures designs

    • The crossover and multiperiod Latin square designs deal effectively with the situation where you have two or several treatment levels and two blocking factors which can neither be standardized nor measured (for example days and sites if you are comparing trap catches).
    • The squares should always be replicated spatially both in order to have sufficient power, and in order to avoid pseudoreplication.
    • Because there is only one replicate of each combination of treatment, row and column in each square, the assumption is made that there is no interaction between treatments, rows and columns.
    • You must assume there are no 'after-effects' of the previous time period's treatment on the subsequent response.

Related topics :

Systematic allocation

Unequal randomization

Allocation concealment