Scatter graphs
🎯 In this topic you will
- Draw and interpret scatter graphs
🧠 Key Words
- correlation
- line of best fit
- scatter graph
Show Definitions
- correlation: A measure of how closely two sets of data are related, showing whether they move in the same or opposite directions.
- line of best fit: A straight line drawn on a scatter graph that shows the general trend of the data points.
- scatter graph: A graph that uses points to display the relationship between two variables.
A scatter graph is a useful way to compare two sets of data. You can use a scatter graph to find out whether there is a correlation, or relationship, between the two sets of data. Two sets of data could have:
- positive correlation – as one value increases, the other value also increases. Example: as the age of a car increases, the distance it has travelled also increases.

- negative correlation – as one value increases, the other value decreases. Example: as the age of a car increases, the value of the car decreases.

- no correlation – there is no relationship between one set of values and the other set of values. Example: adults’ heights do not relate to their ages.

When two sets of data have positive or negative correlation, you can draw a line of best fit on the scatter graph. The line of best fit shows the relationship between the two sets of data. You can use it to estimate other values.
If two sets of data have a strong correlation most of the points will be close to the line of best fit. If the data points are not close to the line of best fit, the sets of data have a weak correlation.
Examples of correlation strength:
- Strong positive correlation
- Weak positive correlation
- Strong negative correlation
- Weak negative correlation

❓ EXERCISE 15.2
1. Hassan carried out a survey on 15 students in his class. He asked them how many hours a week they spend doing homework, and how many hours a week they spend watching TV. The table shows the results of his survey.
| Hours doing homework | 14 | 11 | 19 | 6 | 10 | 3 | 9 | 4 | 12 | 8 | 6 | 15 | 18 | 7 | 12 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Hours watching TV | 7 | 12 | 4 | 15 | 11 | 18 | 15 | 17 | 8 | 14 | 16 | 7 | 5 | 16 | 10 |
a) Draw a scatter graph to show this data. Mark each axis with a scale from 0 to 20. Show ‘Hours doing homework’ on the horizontal axis and ‘Hours watching TV’ on the vertical axis.
b) Does the scatter graph show positive or negative correlation? Explain your answer.
c) Draw a line of best fit on your graph and describe the strength of the correlation.
d) Hassan spends 6 hours watching TV one week. Use your line of best fit to estimate how many hours he spends doing homework that week.
👀 Show answer
a) Scatter graph required (plot homework hours against TV hours).
b) The scatter graph shows a negative correlation – as hours of homework increase, hours of TV decrease.
c) The correlation is fairly strong and negative. A line of best fit slopes downward from left to right.
d) From the line of best fit, if Hassan spends $6$ hours watching TV, he spends about $15$ hours doing homework.
🧠 Think like a Mathematician
Task: Explore the relationship between maximum daytime temperature and the number of cold drinks sold, using correlation and scatter graphs.
Data (14-day period):
| Daytime temperature (°C) | 28 | 26 | 30 | 31 | 34 | 32 | 27 | 25 | 26 | 28 | 29 | 30 | 33 | 27 |
| Cold drinks sold | 25 | 22 | 26 | 28 | 29 | 27 | 24 | 23 | 24 | 27 | 26 | 29 | 31 | 23 |
Questions:
- a) Without looking at the table, what type of correlation would you expect between temperature and cold drinks sold? Why?
- b) Draw a scatter graph (temperature on the x-axis, drinks sold on the y-axis).
- c) What type of correlation does the graph show?
- d) Was your conjecture in part a correct?
- e) Draw a line of best fit.
- f) Estimate the number of drinks sold if the temperature is 44 °C. Is this reliable?
👀 show answer
- a) Expect a positive correlation
- c) The scatter graph shows a clear positive correlation: as temperature increases, the number of cold drinks sold increases.
- d) Yes, the conjecture was correct.
- e) A line of best fit would slope upwards from left to right.
- f) At 44 °C, the line of best fit might predict ~36–38 drinks sold. However, this is extrapolation beyond the data range (25–34 °C), so the estimate is less reliable.
- Conclusion: The data supports the idea that higher temperatures lead to more drinks being sold, but predictions outside the given range should be treated with caution.
❓ EXERCISES
3.The table shows the history and music exam results of 15 students. The results for both subjects are given as percentages.
| History result | 12 | 15 | 22 | 25 | 32 | 36 | 45 | 52 | 58 | 68 | 75 | 77 | 80 | 82 | 85 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Music result | 25 | 64 | 18 | 42 | 65 | 23 | 48 | 24 | 60 | 45 | 68 | 55 | 42 | 32 | 76 |
a)Without looking at the percentages or drawing a graph, do you think there will be positive, negative, or no correlation between the history and music exam results of the students? Explain your answer.
b)Draw a scatter graph to show the data. Mark a scale from 0 to 100 on each axis. Show ‘History result’ on the horizontal axis and ‘Music result’ on the vertical axis.
c)What type of correlation does the scatter graph show? Explain your answer.
d)Was your conjecture in part a correct? Explain your answer.
👀 Show answer
a) Likely there is a positive correlation, since students who do well in history may also do well in music.
b) Scatter graph required (plot history results against music results).
c) The scatter graph shows a weak positive correlation – as history marks increase, music marks tend to increase, though not perfectly.
d) Yes, the conjecture in part a is broadly correct. The graph confirms a positive correlation, though it is not a strong one.
4.The scatter graph shows the distance travelled and the time taken by a taxi driver for the 12 journeys he made on one day.

a)What type and strength of correlation does the scatter graph show? Explain your answer.
b)One of the journeys doesn’t seem to fit the correlation. Which journey is this?
Explain why you think this journey might be different from the other journeys.
👀 Show answer
a) The scatter graph shows a strong positive correlation – as the distance travelled increases, the time taken also increases in a consistent pattern.
b) The journey at about $20$ km and $12$ minutes does not fit the pattern. It might be different because the driver could have taken a faster route (e.g., motorway), encountered less traffic, or recorded the time incorrectly.
🧠 Think like a Mathematician
Task:Critique two lines of best fit, describe how to draw a good one, and discuss using it for predictions.
Scenario:A scatter graph shows body length (cm) vs wingspan (cm) for 10 birds. Marcus has drawn a red line of best fit. Arun has drawn a black line of best fit.
Questions:
- a)Critique Marcus’s and Arun’s lines of best fit.
- b)Suggest a method someone could follow to draw a good line of best fit.
- d)Is it a good idea to use the line of best fit to make predictions outside the data range (e.g., for a body length of 75 cm)? Explain.
👀 show answer
a) Critique
- The data show a clear positive trend with one obvious outlier (the blue ✕ around length ≈ 44 cm, wingspan ≈ 122 cm).
- Red line (Marcus): looks a bit too steep and appears pulled toward the outlier; it leaves slightly more points below than above across the main cluster.
- Black line (Arun): passes more centrally through the cluster, with a more balanced number of points above and below and is less influenced by the outlier. This makes it the better “line of best fit” by eye.
b) How to draw a good line of best fit (by eye)
- Identify and do not anchor to any outliers; judge the trend from the main cluster.
- Sketch a straight line through the middle of the cloud so that points are roughly balanced above/below and left/right along the line.
- Choose two well-separated points that lie on your line (not necessarily data points), read their coordinates, and use them to find the slope and the equation if needed.
- Use the line only to predict within the observed x-range.
d) Using the line beyond the data?
- Predicting at 75 cm body length would be extrapolation (the data stop around 60 cm).
- Outside the observed range, the relationship may change (biology/scale effects), so such predictions are not reliable. Stick to the data range for sensible estimates.
❓ EXERCISES
6.The table shows the number of fish recorded at 10 different points in the Red Sea. It also shows the temperature of the sea at each point.
| Sea temperature (°C) | 25 | 26 | 21 | 20 | 22 | 24 | 28 | 23 | 21 | 19 |
|---|---|---|---|---|---|---|---|---|---|---|
| Number of fish | 102 | 75 | 122 | 129 | 120 | 92 | 75 | 95 | 138 | 146 |
a) Draw a scatter graph to show this data.
b)Describe the type and strength of the correlation between the number of fish and the temperature of the sea.
c)Draw a line of best fit on your scatter graph. Use your line of best fit to estimate the number of fish at a point where the temperature is 27°C.
d)Do you think it is a good idea to use your line of best fit to predict the number of fish in the Red Sea when the temperature of the sea is 30°C, 35°C or even higher? Explain your answer.
e)Scientists estimate that the sea temperature in the world is increasing every year. Use your graph to predict what will happen to the fish population in the sea as temperatures increase.
👀 Show answer
a) Scatter graph required (temperature on the x-axis, number of fish on the y-axis).
b) The correlation is negative and fairly strong: as the temperature increases, the number of fish decreases.
c) Using a line of best fit, at $27^\circ$C the number of fish is about $85$.
d) No. Extrapolating beyond the observed data (above $28^\circ$C) is unreliable, since the relationship may not continue the same way.
e) As sea temperatures rise, the fish population is predicted to fall, leading to fewer fish in the Red Sea.
7.Twenty learners in a school completed the same maths test. The length of their right foot was also measured. This scatter graph shows the results:

Sofia says: “The scatter graph shows a positive correlation. This means that the longer your foot, the better you are at maths.”
Zara says: “That can’t be true! Being good at maths is not related to your foot length.”
a) Explain why Zara is correct.
b) Discuss your answer to part a with other learners in your class.
👀 Show answer
a)Zara is correct because correlation does not mean causation. The scatter graph shows a positive correlation, but this is likely due to age: older students have longer feet and also tend to do better at maths. Foot length itself does not cause better maths results.
b)(Discussion) Learners should note that other factors, such as age or experience, explain the pattern. It is important to understand that two variables being correlated does not mean one causes the other.
⚠️ Be careful!
- Choose axes sensibly: put the explanatory variable on x and the response on y; swapping changes the interpretation.
- Do not join the dots: scatter graphs show points only; lines between points imply data in between that you don’t have.
- Scale evenly: use equal tick intervals on both axes; inconsistent scales can fake stronger/weaker correlation.
- Tiny crosses, not blobs: large markers can hide overlap and outliers.
- Line of best fit: balance points above/below and left/right; don’t force it through $(0,0)$ unless the context demands it.
- Ignore outliers when fitting (but explain them): one odd point should not drag your line; note possible reasons separately.
- Correlation ≠ causation: a relationship in the plot does not prove one variable causes the other.
- Strength vs direction: “positive/negative” is direction; strength depends on how tightly points cluster around a line.
- Interpolation only (safer): estimates within the data range are approximate; extrapolation beyond it is risky.
- Non-linear patterns: a straight line is inappropriate if the cloud is curved; consider a different model or transform.
- Units & labels: label axes with units; mixing hours with minutes or °C with °F will mislead.
- Read to the line, not the grid: when estimating, go to the best-fit line first, then across to the axis.
