Calculating statistics for grouped data

Past Papers

Multimedia

Forum

QuizHub

Tutorial

School

calendar_month Last update: 2025-09-06

visibilityViewed: 7

bug_report Crash report

Calculating statistics for grouped data

calendar_month 2025-09-06

visibility 7

bug_report Crash report

Unit 1: Probability
Unit 2: Data Collection
Unit 3: Interpreting and discussing results

🎯 In this topic you will

Use mode, median, mean, and range to compare sets of grouped data

You already know how to work out the mode, median, mean and range for individual data and also for data represented in a frequency table.

When data is grouped, you cannot work out exact values for the mode, median, mean and range, because you do not have the individual data. However, you can write the modal class interval and the class interval where the median lies, and you can work out estimates for the mean and the range.

Worked example

The frequency table shows the masses of 20 teachers.

Mass, m (kg)	Frequency
60 < m ≤ 70	4
70 < m ≤ 80	7
80 < m ≤ 90	6
90 < m ≤ 100	3

a. Write
i. the modal class interval
ii. the class interval where the median lies.

b. Work out an estimate for
i. the range ii. the mean.

c. Explain why your answers to part b are estimates.

Answer:

a i.$70 < m \le 80$ (largest frequency 7)

a ii. There are 20 values, so the median is halfway between the 10th and 11th. The first 4 are in $60 < m \le 70$; the next 7 (5th–11th) are in $70 < m \le 80$. Therefore the median class is $70 < m \le 80$.

b i. Estimate of range $= 100 - 60 = \mathbf{40\text{ kg}}$.

Midpoint	Frequency	Midpoint × frequency
65	4	260
75	7	525
85	6	510
95	3	285
Totals	20	1580

b ii. Estimate of mean $= \dfrac{1580}{20} = \mathbf{79\text{ kg}}$.

c. These are estimates because the data are grouped; individual values within each class are unknown, so exact range and mean cannot be calculated.

Why midpoint/estimate methods? For grouped data, use the class midpoint to represent all values in that class. Multiply midpoint by frequency, total these, then divide by the overall frequency to estimate the mean. The estimated range uses class boundaries (here 60 and 100) because the exact smallest/largest values aren’t known.

🧠 PROBLEM-SOLVING Strategy — Mode, Median, Mean & Range for Grouped Data

You don’t know the exact values inside each class, so you report classes for the mode/median and estimates for mean/range.

Modal class interval
Pick the class with the largest frequency. (This is the “most common” class.)
Median class interval
Find the position of the median: for total frequency N, it’s at the (N+1)/2-th value (or the halfway point if you prefer). Form a cumulative frequency and locate the class that contains that position.
Estimated mean
Treat every value in a class as its midpoint.
For each class: midpoint × frequency → sum these, then divide by total frequency.
Estimated mean = (Σ(midpoint × freq)) ÷ (Σ freq)
Estimated range
Use the outer class boundaries: largest upper boundary − smallest lower boundary. (Exact min/max inside those classes are unknown.)
Explain “why estimates?”
Because individual data within a class could be anywhere across the interval; midpoints and boundaries are approximations.

Compare two grouped sets
• Centre: modal class & median class (and estimated means).
• Spread: estimated range; also note where most frequencies cluster.
• Use the same class scheme for a fair comparison.

Mini example

Class (kg): 60–70, 70–80, 80–90, 90–100

Freq: 4, 7, 6, 3

Midpts: 65, 75, 85, 95

Σ(midpt×f) = 260 + 525 + 510 + 285 = 1580; Σf = 20

Modal class: 70–80 (freq 7)

Median class: 70–80 (values 5th–11th lie here)

Est. mean: 1580 ÷ 20 = 79 kg

Est. range: 100 − 60 = 40 kg

Common pitfalls

Using class boundaries instead of midpoints for the mean calculation.
For the median class, forgetting to use cumulative frequency and the median position.
Claiming exact mean/range from grouped data—state them as estimates.
Mixing class conventions (e.g., 60–70 then 70–80) without consistent ≤ / < notation.

Quick checklist — midpoints computed ✓ cumulative freq built ✓ modal & median classes identified ✓ estimated mean/range reported ✓ explanation of “estimate” included ✓

❓ EXERCISE

1. The table shows the heights of the students in class 9R.

Height, $h$ (cm)	Frequency
$140 \le h < 150$	7
$150 \le h < 160$	13
$160 \le h < 170$	6
$170 \le h < 180$	2

a) Write i) the modal class interval; ii) the class interval where the median lies.

b) Explain why you can only give class intervals for the mode and median, and not exact values.

c) Work out an estimate for the range.

d) Copy and complete the working table to find an estimate of the mean (to the nearest cm).

Midpoint	Frequency	Midpoint × frequency
145	7	1015
155	13	2015
165	6	990
175	2	350
Totals	28	4370

👀 Show answer

a) i) Modal class: $150 \le h < 160$ (highest frequency $13$).
ii) Median class: total $n=28$ ⇒ median is between the 14th and 15th values. Cumulative up to $140$–$150$ is $7$, up to $150$–$160$ is $20$, so the median lies in $150 \le h < 160$.

b) The data are grouped into intervals, so we only know how many values fall in each class, not their exact individual heights. Therefore we can state only the class containing the mode/median, not exact numbers.

c) Estimated range ≈ highest boundary − lowest boundary = $180 - 140 = \mathbf{40}$ cm. (Exact range is unknown with grouped data.)

d) Using midpoints $145,155,165,175$:
$\sum (m \times f) = 4370$, $\sum f = 28$ ⇒ estimated mean $= \dfrac{4370}{28} \approx 156.07 \text{ cm} \approx \mathbf{156\text{ cm}}$ (to the nearest cm).

2. The table shows the masses of the students in class 9T.

$Mass,\ m\ (kg)$	$Frequency$
$40 \leq m < 50$	$4$
$50 \leq m < 60$	$12$
$60 \leq m < 70$	$8$

a. Write

i. the modal class interval

ii. the class interval where the median lies.

b. Work out an estimate for

i. the mean

ii. the range.

c. Explain why your answers to part b are estimates.

👀 Show answer

i. Modal class interval: $50 \leq m < 60$ (highest frequency $12$).

ii. Median class interval: also $50 \leq m < 60$, since the $12^{th}$ and $13^{th}$ values lie in this class.

i. Estimate for mean:

Midpoints: $45,\ 55,\ 65$.

$\text{Total of }(x \times f) = 45 \times 4 + 55 \times 12 + 65 \times 8 = 180 + 660 + 520 = 1360$.

Total frequency $= 24$.

Mean $\approx \dfrac{1360}{24} \approx 56.7$.

ii. Range $\approx 70 - 40 = 30$.

c. They are estimates because midpoints are assumed for each class, and exact values of the data are not known.

🧠 Think like a Mathematician

Scenario:

When estimating the mean from grouped data:

Marcus: Use the smallest value in each class instead of the midpoint.
Arun: Use the greatest value in each class instead of the midpoint.

Questions:

a) Do you think either method would give a better estimate? Explain your answer.
b) Compare Marcus’s and Arun’s ideas with the midpoint method.

👀 show answer

Neither Marcus nor Arun’s suggestion would give a better estimate of the mean. • Marcus’s method always underestimates, because it takes the lowest possible value. • Arun’s method always overestimates, because it takes the highest possible value.
The midpoint method is designed to be a fair compromise: it balances underestimates and overestimates across classes, assuming the data are fairly evenly spread within each interval.
Conclusion: The midpoint is the best choice for estimating the mean from grouped data. Marcus’s and Arun’s methods are biased to one side.

❓ EXERCISES

4. Anita carried out a survey on the length of time patients waited to see a doctor at two different hospitals. The tables show the results of her survey.

The Heath
$Time,\ t\ (minutes)$	$Frequency$
$0 \le t < 10$	$5$
$10 \le t < 20$	$23$
$20 \le t < 30$	$10$
$30 \le t < 40$	$2$

Moorlands
$Time,\ t\ (minutes)$	$Frequency$
$0 \le t < 10$	$16$
$10 \le t < 20$	$8$
$20 \le t < 30$	$14$
$30 \le t < 40$	$12$

a. How many people were surveyed at each hospital?

b. Copy and complete this table.

$Hospital$	$Modal\ class\ interval$	$Class\ interval\ where\ the\ median\ lies$	$Estimate\ of\ mean$
$The\ Heath$
$Moorlands$

c. Compare and comment on the average waiting times for the two hospitals.

d. Which hospital would you prefer to go to, based only on the waiting times? Explain your answer.

👀 Show answer

$\text{The Heath total} = 5 + 23 + 10 + 2 = 40$ patients.

$\text{Moorlands total} = 16 + 8 + 14 + 12 = 50$ patients.

Modal class interval:

$\text{The Heath: }10 \le t < 20\ (\text{frequency }23)$; $\text{Moorlands: }0 \le t < 10\ (\text{frequency }16)$.

Median class (use $n/2$th value):

$\text{The Heath: }n=40,\ n/2=20$th $\Rightarrow 10 \le t < 20$.

$\text{Moorlands: }n=50,\ n/2=25$th $\Rightarrow 20 \le t < 30$.

Estimate of mean (midpoints $5,15,25,35$):

$\text{The Heath: }(5\times 5) + (15\times 23) + (25\times 10) + (35\times 2) = 690$.

$\bar t_{\text{Heath}} \approx \dfrac{690}{40} = 17.25\ \text{minutes}$.

$\text{Moorlands: }(5\times 16) + (15\times 8) + (25\times 14) + (35\times 12) = 970$.

$\bar t_{\text{Moorlands}} \approx \dfrac{970}{50} = 19.4\ \text{minutes}$.

c. The Heath has a smaller average waiting time: $17.25$ minutes versus $19.4$ minutes for Moorlands. Moorlands also has more patients in the higher time intervals ($20$–$40$ minutes).

d. $\text{The Heath}$, because the average waiting time is lower and a smaller proportion of patients wait in the longest intervals. (Answer based only on waiting times.)

🧠 Think like a Mathematician

Hank has a 6-sided spinner (1–6) and a 7-sided spinner (1–7). He spins both and adds the scores.

a) What is the smallest possible total?
b) What is the greatest possible total?
c) Hank spins 20 times. The totals are:
10, 3, 13, 8, 3, 12, 7, 2, 9, 3, 9, 6, 2, 10, 6, 8, 3, 8, 11, 10
Work out the mean, median and mode.
d) Hank wants grouped tables. Complete both.

Table A

Score	Frequency
2–4	6
5–7	3
8–10	8
11–13	3

Table B

Score	Frequency
2–5	6
6–9	8
10–13	6

👀 show answers

a) Smallest = 2 (1+1).
b) Greatest = 13 (6+7).
c)
- Mean: total $=143$; $143 ÷ 20 = \mathbf{7.15}$.
- Median: ordered middle is between 10th & 11th values → $\mathbf{8}$.
- Mode: most frequent value → $\mathbf{3}$.
d) Completed tables shown above.
e) Modal class / median class / grouped-mean estimates

Table	Modal class	Median class	Estimate of mean
A	8–10	8–10	$\frac{3·6+6·3+9·8+12·3}{20}=\mathbf{7.2}$
B	6–9	6–9	$\frac{3.5·6+7.5·8+11.5·6}{20}=\mathbf{7.5}$

f) Compare
- Accurate mean 7.15 vs estimates 7.2 (A) and 7.5 (B): A is closer—narrower classes help.
- Accurate median 8 lies in 8–10 (A) and 6–9 (B); A is tighter.
- Accurate mode 3 becomes modal class 8–10 (A) / 6–9 (B): grouping can mask a single-value mode.

❓ EXERCISES

6. The table shows the masses of $50$ meerkats.

$Mass,\ m\ (g)$	$Frequency$
$600 \le m < 650$	$2$
$650 \le m < 700$	$5$
$700 \le m < 750$	$7$
$750 \le m < 800$	$12$
$800 \le m < 850$	$10$
$850 \le m < 900$	$8$
$900 \le m < 950$	$4$
$950 \le m < 1000$	$2$

a. Write

i. the modal class interval

ii. the class interval where the median lies.

b. Work out an estimate for

i. the mean ii. the range.

🧠 Tip

Use the frequencies in the table at the start of the question to complete this table.

c. Zara decides to regroup the data, using larger group sizes. Copy and complete this table.

$Mass,\ m\ (g)$	$Frequency$
$600 \le m < 700$
$700 \le m < 800$
$800 \le m < 900$
$900 \le m < 1000$

d. Write

i. the modal class interval ii. the class interval where the median lies.

e. Work out an estimate for

i. the mean ii. the range.

f. Compare your answers to parts a and b with your answers to parts d and e.

i. Do you think the answers in parts a and b or the answers in parts d and e are more accurate? Explain why.

ii. Were the answers in parts a and b or the answers in parts d and e quicker to work out? Explain why.

👀 Show answer

i. Modal class interval: $750 \le m < 800$ (largest frequency $12$).

ii. Median class interval: $n=50$ so the $25$th and $26$th values lie in $750 \le m < 800$ (cumulative up to this class is $26$).

i. Estimate of mean (midpoints $625,675,725,775,825,875,925,975$):

$\sum (x\! \times\! f)= 625\cdot2 + 675\cdot5 + 725\cdot7 + 775\cdot12 + 825\cdot10 + 875\cdot8 + 925\cdot4 + 975\cdot2 = 39900$.

$\bar m \approx \dfrac{39900}{50} = 798\ \text{g}$.

ii. Range $\approx 1000 - 600 = 400\ \text{g}$.

c. Regrouped frequencies:

$600 \le m < 700: 2+5 = 7$ $700 \le m < 800: 7+12 = 19$ $800 \le m < 900: 10+8 = 18$ $900 \le m < 1000: 4+2 = 6$.

i. Modal class interval: $700 \le m < 800$ (frequency $19$).

ii. Median class interval: with $n=50$, the $25$th–$26$th values fall in $700 \le m < 800$ (cumulative totals $7,26,44,50$).

i. Estimate of mean with regrouped classes (midpoints $650,750,850,950$):

$\sum (x\! \times\! f)=650\cdot7+750\cdot19+850\cdot18+950\cdot6=39800$.

$\bar m \approx \dfrac{39800}{50}=796\ \text{g}$.

ii. Range $\approx 1000 - 600 = 400\ \text{g}$ (unchanged).

i. The answers from parts a and b are more accurate: the classes are narrower ($50\ \text{g}$ wide), so using midpoints gives a better approximation than the wider grouped table.

ii. The regrouped answers (parts d and e) are quicker to work out because there are fewer classes and fewer midpoint calculations.

⚠️ Be careful!

No exact values from grouped data: you can state the modal class and the median class, but the exact mode/median are unknown.
Find the median class by cumulative frequency: locate the $\lceil N/2 \rceil$-th item (or halfway between the $N/2$ and $N/2+1$ items) in the running total, not by eye.
Mean = midpoint method: use class midpoints, compute $\sum(\text{midpoint}\times \text{frequency}) / \sum \text{frequency}$; don’t round midpoints too early.
Range is an estimate: use class boundaries (max boundary − min boundary), not bar tops or midpoints.
Modal class ≠ modal value: do not quote the midpoint of the modal class as “the mode.” State the class interval.
Equal-width classes: keep class widths equal. If widths differ, compare with frequency density (advanced) and be cautious interpreting the mode.
Open-ended classes: intervals like “$m\ge100$” make mean/range estimates less reliable—mention this limitation.
Consistent boundaries: avoid overlaps (use $60, then $70); include units in headings and answers.
Total check: ensure $\sum f$ matches the stated sample size before calculating any averages.
Report clearly: label results as estimates (e.g., “estimated mean,” “estimated range,” “median lies in …”).

📘 What we've learned — Grouped Data: Mode, Median, Mean & Range

Grouped data ≠ exact values. You can state a modal class and a median class, and make estimates for the mean and range.
Modal class: the class interval with the largest frequency.
Median class: find cumulative frequencies and locate the n/2-th (or (n+1)/2-th) value; the class that contains it is the median class.
Estimated mean (midpoint method):
1) Add a midpoint column (mid = (lower + upper)/2).
2) Compute mid × freq for each class.
3) Estimatemean ≈ (Σ(mid × freq)) / (Σ freq).
Estimated range: use outer class boundaries: range ≈ (largest upper boundary) − (smallest lower boundary).
Why estimates? Individual values inside each class are unknown; using midpoints assumes values are evenly spread within a class.
Good class design: use equal widths and clear, non-overlapping notation (e.g., 70 < m ≤ 80 or 150 ≤ h < 160).
Comparing two grouped sets: contrast modal class, median class position, estimated mean (central tendency), and estimated range (spread). Note how different class widths or regrouping can change the picture.
Common pitfalls:
- Using class endpoints instead of midpoints for the mean.
- Overlapping classes (e.g., writing 70–80 then 80–90).
- Reading the median as a single value rather than a class.
- Estimating range from class midpoints instead of boundaries.
Quick checklist: midpoints ✓ cumulative freq ✓ mean table (mid, f, mid×f) ✓ totals checked ✓ notation consistent ✓