Example, with R
Line graphs are most people's idea of what a graph should look like.
A line graph can be drawn using a pencil and ruler, or with R
- The lines make it easy to follow the trend and interpolate between adjacent points.
- Values of x usually must be arranged in ascending order.
- The pairing of values within variables x and y is assumed to be important. In other words each pair of values are assumed to be made upon the same item, or upon the same event, or are made at the same instant in time...
Definition and Use
- Line graphs or line plots most often describe scatterplots, where the x variable is time and plotted in ascending order, and the y values are observations made at those times - these graph points are joined by straight lines to enable y to be estimated between them. Line graphs such as these are also called time series plots.
- The terms are used more broadly to refer to any sort of graph where data are plotted using lines - including a bargraph with very narrow bars.
- The term 'lineplot' is also sometimes used to refer to dot histograms.
Tips and Notes
- For time series plots, always use a 'real time' scale on the x-axis, not 'sampling occasion'. One of the most misleading practices in graphics is to plot at fixed time intervals values that were obtained at very different time intervals.
- If you have data for one time period missing, that should be apparent on the plot. Either leave a gap in the line plot, or join the points on either side with a dotted line to make it clear you are extrapolating.
Have a look at these graphs and see how you interpret them:
The cover story, "Why does college have to cost so much?" shows a large graph superimposed on a scene from the Cornell campus. There are two jagged lines running across the graph, one labeled "Cornell's Tuition" and the other "Cornell's Ranking". The tuition graph shows a steady rise, and the ranking graph, after some early meandering, plummets to an all-time low. The clear impression is that students are paying more for far less. All scales have been removed from the image.
How many graphs have you looked at cursorily, and just accepted the message?
Careful reading of the whole article reveals a different story:
- The ranking graph covers an 11 year period, the tuition graph 35 years, yet they are shown simultaneously (the same apparent width) on the same horizontal "scale".
- The vertical scale for tuition and ranking could not possibly have common units, but the ranking graph is placed under the tuition graph creating the impression that cost exceeds quality.
- The differing time units are cleverly disguised by rotating them by 90 degrees.
- And here is the masterstroke: the sharp "drop" in the ranking graph over the past few years actually represents the fact that Cornell's rank has improved from 15th TO 6th ...
Figures & text of this 'Test Yourself' section courtesy of Michael Friendly and Dave Bock
- Friendly, M. Gallery of Data Visualization.
- Excellent overview showing some of the best and worst in data visualization.
- Ihaka, R. Statistics 120 Good and Bad Graphs.
- As well as giving many excellent tips on drawing graphs, Ross Ihaka defines the "Lie Factor" (first given by Ed Tufte of Yale University) as the size of effect shown in graphic divided by size of effect shown in data.
- Kabacoff, R.I. (2012). Quick-R: Line charts. Full text
- Covers all the different options available when doing a line graph in R.
- Wikipedia: Bar chart.
- Useful text but lack of label on vertical axis of example makes it a classic example of how not to do it.