Past Papers
Multimedia
Forum
QuizHub
Tutorial
School
calendar_month Last update: 2025-09-05
visibilityViewed: 8
bug_report Crash report

Scatter graphs

Scatter graphs

calendar_month 2025-09-05
visibility 8
bug_report Crash report
  • Unit 1: Probability
  • Unit 2: Data Collection
  • Unit 3: Interpreting and discussing results

🎯 In this topic you will

  • Draw and interpret scatter graphs
 

🧠 Key Words

  • correlation
  • line of best fit
  • scatter graph
Show Definitions
  • correlation: A measure of how closely two sets of data are related, showing whether they move in the same or opposite directions.
  • line of best fit: A straight line drawn on a scatter graph that shows the general trend of the data points.
  • scatter graph: A graph that uses points to display the relationship between two variables.
 

A scatter graph is a useful way to compare two sets of data. You can use a scatter graph to find out whether there is a correlation, or relationship, between the two sets of data. Two sets of data could have:

  • positive correlation – as one value increases, the other value also increases. Example: as the age of a car increases, the distance it has travelled also increases.

  • negative correlation – as one value increases, the other value decreases. Example: as the age of a car increases, the value of the car decreases.

  • no correlation – there is no relationship between one set of values and the other set of values. Example: adults’ heights do not relate to their ages.

When two sets of data have positive or negative correlation, you can draw a line of best fit on the scatter graph. The line of best fit shows the relationship between the two sets of data. You can use it to estimate other values.

If two sets of data have a strong correlation most of the points will be close to the line of best fit. If the data points are not close to the line of best fit, the sets of data have a weak correlation.

Examples of correlation strength:

  • Strong positive correlation
  • Weak positive correlation
  • Strong negative correlation
  • Weak negative correlation

 

 
Worked example

The table shows the maths and science test results of 12 students. Each test was marked out of 10.

Maths result 8 5 2 10 5 8 9 3 6 6 7 3
Science result 7 4 3 9 6 8 8 4 5 4 8 2

a. Draw a scatter graph to show this data.
b. Draw a line of best fit on your graph.
c. What strength of positive correlation does the scatter graph show? Explain your answer.
d. Maddie scored $7$ in the maths test. She was ill for the science test. Use your line of best fit to estimate a score for Maddie in her science test.

Answer:

a. Mark each axis with a scale from $0$ to $10$. Take the horizontal axis as the Maths result and the vertical axis as the Science result. Plot all 12 points.

Scatter graph of maths vs science results with a roughly upward trend; line of best fit shown

b. Add a straight line of best fit passing approximately through the middle of the crosses.

c. The graph shows a weak positive correlation — most points rise with maths score but many are not close to the line of best fit.

d. Read up from $x=7$ on the maths axis to the line of best fit, then across to the science axis. The estimate is about $6.7$, which rounds to $7$.

Tips for drawing/reading scatter graphs. Label axes clearly, use a sensible $0$–$10$ scale, and plot all ordered pairs. A line of best fit should balance the points above and below. Correlation is stronger when points lie close to the line; here the spread is wide, so the positive correlation is weak. For predictions, project to the line then across to read the estimated value.

 

🧠 PROBLEM-SOLVING Strategy

Draw & Interpret Scatter Graphs

Plot paired data, look for correlation, sketch a fair line of best fit, and use it sensibly for estimates.

  1. Set up axes.
    • Choose which variable is explanatory (goes on x) and response (y).
    • Use sensible, even scales starting at 0 (unless context justifies otherwise).
  2. Plot accurately.
    • Plot each ordered pair (x, y) as a small cross.
    • Label axes and add a clear title.
  3. Assess correlation.
    Positive: points rise left→right.
    Negative: points fall left→right.
    None: no obvious trend.
    Strength: strong if points hug a line; weak if widely scattered.
  4. Draw a line of best fit (by eye).
    • Ignore obvious outliers when positioning the line.
    • Balance points above/below and left/right of the line across the cloud.
    • Don’t force the line through (0,0) unless the context demands it.
  5. Estimate values.
    Interpolation: read along x to the line, then across to estimate y (or vice versa) within the data range.
    Extrapolation: be cautious beyond the data range—trends may change.
  6. Check reasonableness.
    • Units consistent? Scale sensible?
    • Outliers explained (measurement error/special case)?
    • Remember: correlation ≠ causation.
Mini how-to
• Draw line: balance the point cloud; pick two points on your line to compute its slope if needed.
• Predict: for x = 7 (maths), move up to line, across to y ≈ 6–7 (science). Round sensibly.
• Strength clue: narrower “cigar” shape → stronger correlation.
Common pitfalls
  • Letting an outlier drag the best-fit line.
  • Reading off to gridlines instead of to the line when estimating.
  • Concluding cause from correlation (e.g., foot length “causes” better maths).
  • Using wildly extrapolated predictions (e.g., at x far beyond observed values).
Quick notes:Positive ↗, Negative ↘, None • Interpolate (safer) > Extrapolate (riskier) • Correlation strength ≈ closeness to line
 

EXERCISE 15.2

1. Hassan carried out a survey on 15 students in his class. He asked them how many hours a week they spend doing homework, and how many hours a week they spend watching TV. The table shows the results of his survey.

Hours doing homework 14 11 19 6 10 3 9 4 12 8 6 15 18 7 12
Hours watching TV 7 12 4 15 11 18 15 17 8 14 16 7 5 16 10

a) Draw a scatter graph to show this data. Mark each axis with a scale from 0 to 20. Show ‘Hours doing homework’ on the horizontal axis and ‘Hours watching TV’ on the vertical axis.

b) Does the scatter graph show positive or negative correlation? Explain your answer.

c) Draw a line of best fit on your graph and describe the strength of the correlation.

d) Hassan spends 6 hours watching TV one week. Use your line of best fit to estimate how many hours he spends doing homework that week.

👀 Show answer

a) Scatter graph required (plot homework hours against TV hours).

b) The scatter graph shows a negative correlation – as hours of homework increase, hours of TV decrease.

c) The correlation is fairly strong and negative. A line of best fit slopes downward from left to right.

d) From the line of best fit, if Hassan spends $6$ hours watching TV, he spends about $15$ hours doing homework.

 

🧠 Think like a Mathematician

Task: Explore the relationship between maximum daytime temperature and the number of cold drinks sold, using correlation and scatter graphs.

Data (14-day period):

Daytime temperature (°C) 28 26 30 31 34 32 27 25 26 28 29 30 33 27
Cold drinks sold 25 22 26 28 29 27 24 23 24 27 26 29 31 23

Questions:

  1. a) Without looking at the table, what type of correlation would you expect between temperature and cold drinks sold? Why?
  2. b) Draw a scatter graph (temperature on the x-axis, drinks sold on the y-axis).
  3. c) What type of correlation does the graph show?
  4. d) Was your conjecture in part a correct?
  5. e) Draw a line of best fit.
  6. f) Estimate the number of drinks sold if the temperature is 44 °C. Is this reliable?
👀 show answer
  • a) Expect a positive correlation
  • c) The scatter graph shows a clear positive correlation: as temperature increases, the number of cold drinks sold increases.
  • d) Yes, the conjecture was correct.
  • e) A line of best fit would slope upwards from left to right.
  • f) At 44 °C, the line of best fit might predict ~36–38 drinks sold. However, this is extrapolation beyond the data range (25–34 °C), so the estimate is less reliable.
  • Conclusion: The data supports the idea that higher temperatures lead to more drinks being sold, but predictions outside the given range should be treated with caution.
 

EXERCISES

3.The table shows the history and music exam results of 15 students. The results for both subjects are given as percentages.

History result 12 15 22 25 32 36 45 52 58 68 75 77 80 82 85
Music result 25 64 18 42 65 23 48 24 60 45 68 55 42 32 76

a)Without looking at the percentages or drawing a graph, do you think there will be positive, negative, or no correlation between the history and music exam results of the students? Explain your answer.

b)Draw a scatter graph to show the data. Mark a scale from 0 to 100 on each axis. Show ‘History result’ on the horizontal axis and ‘Music result’ on the vertical axis.

c)What type of correlation does the scatter graph show? Explain your answer.

d)Was your conjecture in part a correct? Explain your answer.

👀 Show answer

a) Likely there is a positive correlation, since students who do well in history may also do well in music.

b) Scatter graph required (plot history results against music results).

c) The scatter graph shows a weak positive correlation – as history marks increase, music marks tend to increase, though not perfectly.

d) Yes, the conjecture in part a is broadly correct. The graph confirms a positive correlation, though it is not a strong one.

4.The scatter graph shows the distance travelled and the time taken by a taxi driver for the 12 journeys he made on one day.

Scatter graph of distance travelled and time taken by taxi driver

a)What type and strength of correlation does the scatter graph show? Explain your answer.

b)One of the journeys doesn’t seem to fit the correlation. Which journey is this?
Explain why you think this journey might be different from the other journeys.

👀 Show answer

a) The scatter graph shows a strong positive correlation – as the distance travelled increases, the time taken also increases in a consistent pattern.

b) The journey at about $20$ km and $12$ minutes does not fit the pattern. It might be different because the driver could have taken a faster route (e.g., motorway), encountered less traffic, or recorded the time incorrectly.

 

🧠 Think like a Mathematician

Task:Critique two lines of best fit, describe how to draw a good one, and discuss using it for predictions.

Scenario:A scatter graph shows body length (cm) vs wingspan (cm) for 10 birds. Marcus has drawn a red line of best fit. Arun has drawn a black line of best fit.

Questions:

  1. a)Critique Marcus’s and Arun’s lines of best fit.
  2. b)Suggest a method someone could follow to draw a good line of best fit.
  3. d)Is it a good idea to use the line of best fit to make predictions outside the data range (e.g., for a body length of 75 cm)? Explain.
👀 show answer

a) Critique

  • The data show a clear positive trend with one obvious outlier (the blue ✕ around length ≈ 44 cm, wingspan ≈ 122 cm).
  • Red line (Marcus): looks a bit too steep and appears pulled toward the outlier; it leaves slightly more points below than above across the main cluster.
  • Black line (Arun): passes more centrally through the cluster, with a more balanced number of points above and below and is less influenced by the outlier. This makes it the better “line of best fit” by eye.

b) How to draw a good line of best fit (by eye)

  1. Identify and do not anchor to any outliers; judge the trend from the main cluster.
  2. Sketch a straight line through the middle of the cloud so that points are roughly balanced above/below and left/right along the line.
  3. Choose two well-separated points that lie on your line (not necessarily data points), read their coordinates, and use them to find the slope and the equation if needed.
  4. Use the line only to predict within the observed x-range.

d) Using the line beyond the data?

  • Predicting at 75 cm body length would be extrapolation (the data stop around 60 cm).
  • Outside the observed range, the relationship may change (biology/scale effects), so such predictions are not reliable. Stick to the data range for sensible estimates.
 

EXERCISES

6.The table shows the number of fish recorded at 10 different points in the Red Sea. It also shows the temperature of the sea at each point.

Sea temperature (°C) 25 26 21 20 22 24 28 23 21 19
Number of fish 102 75 122 129 120 92 75 95 138 146

a) Draw a scatter graph to show this data.

b)Describe the type and strength of the correlation between the number of fish and the temperature of the sea.

c)Draw a line of best fit on your scatter graph. Use your line of best fit to estimate the number of fish at a point where the temperature is 27°C.

d)Do you think it is a good idea to use your line of best fit to predict the number of fish in the Red Sea when the temperature of the sea is 30°C, 35°C or even higher? Explain your answer.

e)Scientists estimate that the sea temperature in the world is increasing every year. Use your graph to predict what will happen to the fish population in the sea as temperatures increase.

👀 Show answer

a) Scatter graph required (temperature on the x-axis, number of fish on the y-axis).

b) The correlation is negative and fairly strong: as the temperature increases, the number of fish decreases.

c) Using a line of best fit, at $27^\circ$C the number of fish is about $85$.

d) No. Extrapolating beyond the observed data (above $28^\circ$C) is unreliable, since the relationship may not continue the same way.

e) As sea temperatures rise, the fish population is predicted to fall, leading to fewer fish in the Red Sea.

7.Twenty learners in a school completed the same maths test. The length of their right foot was also measured. This scatter graph shows the results:

Scatter graph of foot length vs maths test result

Sofia says: “The scatter graph shows a positive correlation. This means that the longer your foot, the better you are at maths.”

Zara says: “That can’t be true! Being good at maths is not related to your foot length.”

a) Explain why Zara is correct.

b) Discuss your answer to part a with other learners in your class.

👀 Show answer

a)Zara is correct because correlation does not mean causation. The scatter graph shows a positive correlation, but this is likely due to age: older students have longer feet and also tend to do better at maths. Foot length itself does not cause better maths results.

b)(Discussion) Learners should note that other factors, such as age or experience, explain the pattern. It is important to understand that two variables being correlated does not mean one causes the other.

 

⚠️ Be careful!

  • Choose axes sensibly: put the explanatory variable on x and the response on y; swapping changes the interpretation.
  • Do not join the dots: scatter graphs show points only; lines between points imply data in between that you don’t have.
  • Scale evenly: use equal tick intervals on both axes; inconsistent scales can fake stronger/weaker correlation.
  • Tiny crosses, not blobs: large markers can hide overlap and outliers.
  • Line of best fit: balance points above/below and left/right; don’t force it through $(0,0)$ unless the context demands it.
  • Ignore outliers when fitting (but explain them): one odd point should not drag your line; note possible reasons separately.
  • Correlation ≠ causation: a relationship in the plot does not prove one variable causes the other.
  • Strength vs direction: “positive/negative” is direction; strength depends on how tightly points cluster around a line.
  • Interpolation only (safer): estimates within the data range are approximate; extrapolation beyond it is risky.
  • Non-linear patterns: a straight line is inappropriate if the cloud is curved; consider a different model or transform.
  • Units & labels: label axes with units; mixing hours with minutes or °C with °F will mislead.
  • Read to the line, not the grid: when estimating, go to the best-fit line first, then across to the axis.
 

📘 What we've learned — Scatter Graphs

  • Purpose: Scatter graphs compare two variables and reveal correlation (relationship) between them.
  • Axes: Put the explanatory variable on x (horizontal) and the response on y (vertical). Label and scale evenly.
  • Correlation types: Positive (↗), Negative (↘), or None. Strength depends on how tightly points cluster around a line.
  • Line of best fit (by eye): Draw a straight line through the “middle” of the point cloud, balancing points above/below. Ignore clear outliers when positioning.
  • Estimating (interpolation): Read across to or from the line within the data range. Be cautious with extrapolation beyond the data.
  • Outliers: Points far from the pattern. Note them and consider reasons (error, special case) before using them.
  • Correlation ≠ causation: A relationship on a scatter graph does not prove one variable causes the other.
Mini example: Maths vs Science scores (0–10) show a weak positive trend. A best-fit line gives a science estimate ≈ 6–7 when maths = 7.
Quick checklist: Title ✓ Axes & units ✓ Accurate plotting ✓ Best-fit line ✓ Identify type/strength ✓ Sensible estimates ✓ Note outliers ✓