Example, with R
Scatterplots are among the simplest sort of graph (other than rugplots). For example:
This can be done using a pencil and ruler, or with R
The pairing of values in variables x and y is assumed to be important. In other words each value of x is usually 'paired' with a value of y for some good reason.
- For instance, those 5 pairs of x,y values could be the result of examining 5 different farms - each pair of values representing one farm.
Note, with R: plot(y,x) would give a plot x on y.
Definition and Use
- A scatterplot (also called a scattergram or scattergraph) is the graph that results from plotting one variable (Y) against another (X) on a graph. Each point represents one unit and is positioned at the intersection of the values of the two variables.
- The pattern of the points indicates the strength and direction of the association or correlation between the two variables.
- If the points cluster along a band from the lower left to the upper right, this suggests a positive association.
- If the points cluster along a band from the upper left to the lower right, this suggests a negative association.
- If there is no suggestion of the points clustering, then there is no evidence for any association between the two variables.
Tips and Notes
- Association between two variables can never prove that one variable CAUSES the other. It can provide supporting evidence for such a relationship, but ONLY if various other criteria for causality are also met.
These include
- The association must be strong and confirmed in different places and at different times
- Cause must occur before effect
- There should be a dose response relationship
- The relationship must be biologically plausible
- There should be experimental evidence for a causal link.
- Beware of relationships that result from very few points. Sometimes you will find that inclusion of just one 'influential' point can suggest a relationship whereas its exclusion would indicate no relationship.
- In general you should only make predictions (extrapolate) about the value of Y from the value of X if the point lies WITHIN the range of your observations. If you fit a line to a relationship, only use a solid line within those limits.
Test yourself
Inspect the scatterplot shown below.
Would you be convinced by this relationship between the level of glutamate dehydrogenase and the number of flukes in cattle?