Biology, images, analysis, design...
Use/Abuse Stat.Book Beginners Stats & R
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)



Beginners statistics: scatterplot

Example, with R,  Definition and Use,  Tips and Notes,  Test yourself,  References  Download R  R is Free, very powerful, and does the boring calculations & graphs for scientists.

Example, with R

Scatterplots are among the simplest sort of graph (other than rugplots). For example:

This can be done using a pencil and ruler, or with 

The pairing of values in variables x and y is assumed to be important. In other words each value of x is usually 'paired' with a value of y for some good reason.

For instance, those 5 pairs of x,y values could be the result of examining 5 different farms - each pair of values representing one farm.

Note, with R: plot(y,x) would give a plot x on y.

Definition and Use

  1. A scatterplot (also called a scattergram or scattergraph) is the graph that results from plotting one variable (Y) against another (X) on a graph. Each point represents one unit and is positioned at the intersection of the values of the two variables.
  2. The pattern of the points indicates the strength and direction of the association or correlation between the two variables.
    • If the points cluster along a band from the lower left to the upper right, this suggests a positive association.
    • If the points cluster along a band from the upper left to the lower right, this suggests a negative association.
    • If there is no suggestion of the points clustering, then there is no evidence for any association between the two variables.

Tips and Notes

  1. Association between two variables can never prove that one variable CAUSES the other. It can provide supporting evidence for such a relationship, but ONLY if various other criteria for causality are also met.

    These include

    • The association must be strong and confirmed in different places and at different times
    • Cause must occur before effect
    • There should be a dose response relationship
    • The relationship must be biologically plausible
    • There should be experimental evidence for a causal link.
  2. Beware of relationships that result from very few points. Sometimes you will find that inclusion of just one 'influential' point can suggest a relationship whereas its exclusion would indicate no relationship.
  3. In general you should only make predictions (extrapolate) about the value of Y from the value of X if the point lies WITHIN the range of your observations. If you fit a line to a relationship, only use a solid line within those limits.

Test yourself

Inspect the scatterplot shown below.

Would you be convinced by this relationship between the level of glutamate dehydrogenase and the number of flukes in cattle?

  • The red line was fitted by ordinary least-squares regression of y on x, for all the points shown.

    Data courtesy of Leclipteux et al. (1998) 

    Useful references

    Griffiths, D. et al. (1998). Understanding Data. Principles and Practice of Statistics. Wiley, Brisbane.
    Give an excellent account of exploratory data analysis of bivariate relationships using scatterplots, including use of the median trace.

    Kabacoff, R.I. (2012). Quick-R: Scatterplots. Full text 
    Covers simple scatterplots, scatterplot matrices, high density scatterplots and 3D scatterplots

    Kuo (2002). Extrapolation of correlation between 2 variables in 4 general medical journals. JAMA 287 (21), 2815-2817. Full text 
    Looks at the prevalence of unjustified extrapolation in recent medical literature.

    Wikipedia: Scatter Plot. Full text 

    See Also