Calculating statistics for grouped data
🎯 In this topic you will
- Use mode, median, mean, and range to compare sets of grouped data
You already know how to work out the mode, median, mean and range for individual data and also for data represented in a frequency table.
When data is grouped, you cannot work out exact values for the mode, median, mean and range, because you do not have the individual data. However, you can write the modal class interval and the class interval where the median lies, and you can work out estimates for the mean and the range.
❓ EXERCISE
1. The table shows the heights of the students in class 9R.
| Height, $h$ (cm) | Frequency |
|---|---|
| $140 \le h < 150$ | 7 |
| $150 \le h < 160$ | 13 |
| $160 \le h < 170$ | 6 |
| $170 \le h < 180$ | 2 |
a) Write i) the modal class interval; ii) the class interval where the median lies.
b) Explain why you can only give class intervals for the mode and median, and not exact values.
c) Work out an estimate for the range.
d) Copy and complete the working table to find an estimate of the mean (to the nearest cm).
| Midpoint | Frequency | Midpoint × frequency |
|---|---|---|
| 145 | 7 | 1015 |
| 155 | 13 | 2015 |
| 165 | 6 | 990 |
| 175 | 2 | 350 |
| Totals | 28 | 4370 |
👀 Show answer
a) i) Modal class: $150 \le h < 160$ (highest frequency $13$).
ii) Median class: total $n=28$ ⇒ median is between the 14th and 15th values. Cumulative up to $140$–$150$ is $7$, up to $150$–$160$ is $20$, so the median lies in $150 \le h < 160$.
b) The data are grouped into intervals, so we only know how many values fall in each class, not their exact individual heights. Therefore we can state only the class containing the mode/median, not exact numbers.
c) Estimated range ≈ highest boundary − lowest boundary = $180 - 140 = \mathbf{40}$ cm. (Exact range is unknown with grouped data.)
d) Using midpoints $145,155,165,175$:
$\sum (m \times f) = 4370$, $\sum f = 28$ ⇒ estimated mean $= \dfrac{4370}{28} \approx 156.07 \text{ cm} \approx \mathbf{156\text{ cm}}$ (to the nearest cm).
2. The table shows the masses of the students in class 9T.
| $Mass,\ m\ (kg)$ | $Frequency$ |
|---|---|
| $40 \leq m < 50$ | $4$ |
| $50 \leq m < 60$ | $12$ |
| $60 \leq m < 70$ | $8$ |
a. Write
i. the modal class interval
ii. the class interval where the median lies.
b. Work out an estimate for
i. the mean
ii. the range.
c. Explain why your answers to part b are estimates.
👀 Show answer
a.
i. Modal class interval: $50 \leq m < 60$ (highest frequency $12$).
ii. Median class interval: also $50 \leq m < 60$, since the $12^{th}$ and $13^{th}$ values lie in this class.
b.
i. Estimate for mean:
Midpoints: $45,\ 55,\ 65$.
$\text{Total of }(x \times f) = 45 \times 4 + 55 \times 12 + 65 \times 8 = 180 + 660 + 520 = 1360$.
Total frequency $= 24$.
Mean $\approx \dfrac{1360}{24} \approx 56.7$.
ii. Range $\approx 70 - 40 = 30$.
c. They are estimates because midpoints are assumed for each class, and exact values of the data are not known.
🧠 Think like a Mathematician
Scenario:
When estimating the mean from grouped data:
- Marcus: Use the smallest value in each class instead of the midpoint.
- Arun: Use the greatest value in each class instead of the midpoint.
Questions:
- a) Do you think either method would give a better estimate? Explain your answer.
- b) Compare Marcus’s and Arun’s ideas with the midpoint method.
👀 show answer
- Neither Marcus nor Arun’s suggestion would give a better estimate of the mean. • Marcus’s method always underestimates, because it takes the lowest possible value. • Arun’s method always overestimates, because it takes the highest possible value.
- The midpoint method is designed to be a fair compromise: it balances underestimates and overestimates across classes, assuming the data are fairly evenly spread within each interval.
- Conclusion: The midpoint is the best choice for estimating the mean from grouped data. Marcus’s and Arun’s methods are biased to one side.
❓ EXERCISES
4. Anita carried out a survey on the length of time patients waited to see a doctor at two different hospitals. The tables show the results of her survey.
| The Heath | |
|---|---|
| $Time,\ t\ (minutes)$ | $Frequency$ |
| $0 \le t < 10$ | $5$ |
| $10 \le t < 20$ | $23$ |
| $20 \le t < 30$ | $10$ |
| $30 \le t < 40$ | $2$ |
| Moorlands | |
|---|---|
| $Time,\ t\ (minutes)$ | $Frequency$ |
| $0 \le t < 10$ | $16$ |
| $10 \le t < 20$ | $8$ |
| $20 \le t < 30$ | $14$ |
| $30 \le t < 40$ | $12$ |
a. How many people were surveyed at each hospital?
b. Copy and complete this table.
| $Hospital$ | $Modal\ class\ interval$ | $Class\ interval\ where\ the\ median\ lies$ | $Estimate\ of\ mean$ |
|---|---|---|---|
| $The\ Heath$ | |||
| $Moorlands$ |
c. Compare and comment on the average waiting times for the two hospitals.
d. Which hospital would you prefer to go to, based only on the waiting times? Explain your answer.
👀 Show answer
a.
$\text{The Heath total} = 5 + 23 + 10 + 2 = 40$ patients.
$\text{Moorlands total} = 16 + 8 + 14 + 12 = 50$ patients.
b.
Modal class interval:
$\text{The Heath: }10 \le t < 20\ (\text{frequency }23)$; $\text{Moorlands: }0 \le t < 10\ (\text{frequency }16)$.
Median class (use $n/2$th value):
$\text{The Heath: }n=40,\ n/2=20$th $\Rightarrow 10 \le t < 20$.
$\text{Moorlands: }n=50,\ n/2=25$th $\Rightarrow 20 \le t < 30$.
Estimate of mean (midpoints $5,15,25,35$):
$\text{The Heath: }(5\times 5) + (15\times 23) + (25\times 10) + (35\times 2) = 690$.
$\bar t_{\text{Heath}} \approx \dfrac{690}{40} = 17.25\ \text{minutes}$.
$\text{Moorlands: }(5\times 16) + (15\times 8) + (25\times 14) + (35\times 12) = 970$.
$\bar t_{\text{Moorlands}} \approx \dfrac{970}{50} = 19.4\ \text{minutes}$.
c. The Heath has a smaller average waiting time: $17.25$ minutes versus $19.4$ minutes for Moorlands. Moorlands also has more patients in the higher time intervals ($20$–$40$ minutes).
d. $\text{The Heath}$, because the average waiting time is lower and a smaller proportion of patients wait in the longest intervals. (Answer based only on waiting times.)
🧠 Think like a Mathematician
Hank has a 6-sided spinner (1–6) and a 7-sided spinner (1–7). He spins both and adds the scores.

- a) What is the smallest possible total?
- b) What is the greatest possible total?
- c) Hank spins 20 times. The totals are:
10, 3, 13, 8, 3, 12, 7, 2, 9, 3, 9, 6, 2, 10, 6, 8, 3, 8, 11, 10
Work out the mean, median and mode. - d) Hank wants grouped tables. Complete both.
Table A
| Score | Frequency |
| 2–4 | 6 |
| 5–7 | 3 |
| 8–10 | 8 |
| 11–13 | 3 |
Table B
| Score | Frequency |
| 2–5 | 6 |
| 6–9 | 8 |
| 10–13 | 6 |
👀 show answers
- a) Smallest = 2 (1+1).
- b) Greatest = 13 (6+7).
- c)
- Mean: total \(=143\); \(143 ÷ 20 = \mathbf{7.15}\).
- Median: ordered middle is between 10th & 11th values → \(\mathbf{8}\).
- Mode: most frequent value → \(\mathbf{3}\).
- d) Completed tables shown above.
- e) Modal class / median class / grouped-mean estimates
| Table | Modal class | Median class | Estimate of mean |
| A | 8–10 | 8–10 | \(\frac{3·6+6·3+9·8+12·3}{20}=\mathbf{7.2}\) |
| B | 6–9 | 6–9 | \(\frac{3.5·6+7.5·8+11.5·6}{20}=\mathbf{7.5}\) |
- f) Compare
- Accurate mean 7.15 vs estimates 7.2 (A) and 7.5 (B): A is closer—narrower classes help.
- Accurate median 8 lies in 8–10 (A) and 6–9 (B); A is tighter.
- Accurate mode 3 becomes modal class 8–10 (A) / 6–9 (B): grouping can mask a single-value mode.
❓ EXERCISES
6. The table shows the masses of $50$ meerkats.
| $Mass,\ m\ (g)$ | $Frequency$ |
|---|---|
| $600 \le m < 650$ | $2$ |
| $650 \le m < 700$ | $5$ |
| $700 \le m < 750$ | $7$ |
| $750 \le m < 800$ | $12$ |
| $800 \le m < 850$ | $10$ |
| $850 \le m < 900$ | $8$ |
| $900 \le m < 950$ | $4$ |
| $950 \le m < 1000$ | $2$ |
a. Write
i. the modal class interval
ii. the class interval where the median lies.
b. Work out an estimate for
i. the mean ii. the range.
🧠 Tip
Use the frequencies in the table at the start of the question to complete this table.
c. Zara decides to regroup the data, using larger group sizes. Copy and complete this table.
| $Mass,\ m\ (g)$ | $Frequency$ |
|---|---|
| $600 \le m < 700$ | |
| $700 \le m < 800$ | |
| $800 \le m < 900$ | |
| $900 \le m < 1000$ |
d. Write
i. the modal class interval ii. the class interval where the median lies.
e. Work out an estimate for
i. the mean ii. the range.
f. Compare your answers to parts a and b with your answers to parts d and e.
i. Do you think the answers in parts a and b or the answers in parts d and e are more accurate? Explain why.
ii. Were the answers in parts a and b or the answers in parts d and e quicker to work out? Explain why.
👀 Show answer
a.
i. Modal class interval: $750 \le m < 800$ (largest frequency $12$).
ii. Median class interval: $n=50$ so the $25$th and $26$th values lie in $750 \le m < 800$ (cumulative up to this class is $26$).
b.
i. Estimate of mean (midpoints $625,675,725,775,825,875,925,975$):
$\sum (x\! \times\! f)= 625\cdot2 + 675\cdot5 + 725\cdot7 + 775\cdot12 + 825\cdot10 + 875\cdot8 + 925\cdot4 + 975\cdot2 = 39900$.
$\bar m \approx \dfrac{39900}{50} = 798\ \text{g}$.
ii. Range $\approx 1000 - 600 = 400\ \text{g}$.
c. Regrouped frequencies:
$600 \le m < 700: 2+5 = 7$ $700 \le m < 800: 7+12 = 19$ $800 \le m < 900: 10+8 = 18$ $900 \le m < 1000: 4+2 = 6$.
d.
i. Modal class interval: $700 \le m < 800$ (frequency $19$).
ii. Median class interval: with $n=50$, the $25$th–$26$th values fall in $700 \le m < 800$ (cumulative totals $7,26,44,50$).
e.
i. Estimate of mean with regrouped classes (midpoints $650,750,850,950$):
$\sum (x\! \times\! f)=650\cdot7+750\cdot19+850\cdot18+950\cdot6=39800$.
$\bar m \approx \dfrac{39800}{50}=796\ \text{g}$.
ii. Range $\approx 1000 - 600 = 400\ \text{g}$ (unchanged).
f.
i. The answers from parts a and b are more accurate: the classes are narrower ($50\ \text{g}$ wide), so using midpoints gives a better approximation than the wider grouped table.
ii. The regrouped answers (parts d and e) are quicker to work out because there are fewer classes and fewer midpoint calculations.
⚠️ Be careful!
- No exact values from grouped data: you can state the modal class and the median class, but the exact mode/median are unknown.
- Find the median class by cumulative frequency: locate the $\lceil N/2 \rceil$-th item (or halfway between the $N/2$ and $N/2+1$ items) in the running total, not by eye.
- Mean = midpoint method: use class midpoints, compute $\sum(\text{midpoint}\times \text{frequency}) / \sum \text{frequency}$; don’t round midpoints too early.
- Range is an estimate: use class boundaries (max boundary − min boundary), not bar tops or midpoints.
- Modal class ≠ modal value: do not quote the midpoint of the modal class as “the mode.” State the class interval.
- Equal-width classes: keep class widths equal. If widths differ, compare with frequency density (advanced) and be cautious interpreting the mode.
- Open-ended classes: intervals like “$m\ge100$” make mean/range estimates less reliable—mention this limitation.
- Consistent boundaries: avoid overlaps (use $60, then $70); include units in headings and answers.
- Total check: ensure $\sum f$ matches the stated sample size before calculating any averages.
- Report clearly: label results as estimates (e.g., “estimated mean,” “estimated range,” “median lies in …”).