Calculating statistics for grouped data
Calculating statistics for grouped data
You already know how to work out the mode, median, mean and range for individual data and also for data represented in a frequency table.
When data is grouped, you cannot work out exact values for the mode, median, mean and range, because you do not have the individual data. However, you can write the modal class interval and the class interval where the median lies, and you can work out estimates for the mean and the range.
1. The table shows the heights of the students in class 9R.
Height, $h$ (cm) | Frequency |
---|---|
$140 \le h < 150$ | 7 |
$150 \le h < 160$ | 13 |
$160 \le h < 170$ | 6 |
$170 \le h < 180$ | 2 |
a) Write i) the modal class interval; ii) the class interval where the median lies.
b) Explain why you can only give class intervals for the mode and median, and not exact values.
c) Work out an estimate for the range.
d) Copy and complete the working table to find an estimate of the mean (to the nearest cm).
Midpoint | Frequency | Midpoint × frequency |
---|---|---|
145 | 7 | 1015 |
155 | 13 | 2015 |
165 | 6 | 990 |
175 | 2 | 350 |
Totals | 28 | 4370 |
a) i) Modal class: $150 \le h < 160$ (highest frequency $13$).
ii) Median class: total $n=28$ ⇒ median is between the 14th and 15th values. Cumulative up to $140$–$150$ is $7$, up to $150$–$160$ is $20$, so the median lies in $150 \le h < 160$.
b) The data are grouped into intervals, so we only know how many values fall in each class, not their exact individual heights. Therefore we can state only the class containing the mode/median, not exact numbers.
c) Estimated range ≈ highest boundary − lowest boundary = $180 - 140 = \mathbf{40}$ cm. (Exact range is unknown with grouped data.)
d) Using midpoints $145,155,165,175$:
$\sum (m \times f) = 4370$, $\sum f = 28$ ⇒ estimated mean $= \dfrac{4370}{28} \approx 156.07 \text{ cm} \approx \mathbf{156\text{ cm}}$ (to the nearest cm).
2. The table shows the masses of the students in class 9T.
$Mass,\ m\ (kg)$ | $Frequency$ |
---|---|
$40 \leq m < 50$ | $4$ |
$50 \leq m < 60$ | $12$ |
$60 \leq m < 70$ | $8$ |
a. Write
i. the modal class interval
ii. the class interval where the median lies.
b. Work out an estimate for
i. the mean
ii. the range.
c. Explain why your answers to part b are estimates.
a.
i. Modal class interval: $50 \leq m < 60$ (highest frequency $12$).
ii. Median class interval: also $50 \leq m < 60$, since the $12^{th}$ and $13^{th}$ values lie in this class.
b.
i. Estimate for mean:
Midpoints: $45,\ 55,\ 65$.
$\text{Total of }(x \times f) = 45 \times 4 + 55 \times 12 + 65 \times 8 = 180 + 660 + 520 = 1360$.
Total frequency $= 24$.
Mean $\approx \dfrac{1360}{24} \approx 56.7$.
ii. Range $\approx 70 - 40 = 30$.
c. They are estimates because midpoints are assumed for each class, and exact values of the data are not known.
Scenario:
When estimating the mean from grouped data:
Questions:
4. Anita carried out a survey on the length of time patients waited to see a doctor at two different hospitals. The tables show the results of her survey.
The Heath | |
---|---|
$Time,\ t\ (minutes)$ | $Frequency$ |
$0 \le t < 10$ | $5$ |
$10 \le t < 20$ | $23$ |
$20 \le t < 30$ | $10$ |
$30 \le t < 40$ | $2$ |
Moorlands | |
---|---|
$Time,\ t\ (minutes)$ | $Frequency$ |
$0 \le t < 10$ | $16$ |
$10 \le t < 20$ | $8$ |
$20 \le t < 30$ | $14$ |
$30 \le t < 40$ | $12$ |
a. How many people were surveyed at each hospital?
b. Copy and complete this table.
$Hospital$ | $Modal\ class\ interval$ | $Class\ interval\ where\ the\ median\ lies$ | $Estimate\ of\ mean$ |
---|---|---|---|
$The\ Heath$ | |||
$Moorlands$ |
c. Compare and comment on the average waiting times for the two hospitals.
d. Which hospital would you prefer to go to, based only on the waiting times? Explain your answer.
a.
$\text{The Heath total} = 5 + 23 + 10 + 2 = 40$ patients.
$\text{Moorlands total} = 16 + 8 + 14 + 12 = 50$ patients.
b.
Modal class interval:
$\text{The Heath: }10 \le t < 20\ (\text{frequency }23)$; $\text{Moorlands: }0 \le t < 10\ (\text{frequency }16)$.
Median class (use $n/2$th value):
$\text{The Heath: }n=40,\ n/2=20$th $\Rightarrow 10 \le t < 20$.
$\text{Moorlands: }n=50,\ n/2=25$th $\Rightarrow 20 \le t < 30$.
Estimate of mean (midpoints $5,15,25,35$):
$\text{The Heath: }(5\times 5) + (15\times 23) + (25\times 10) + (35\times 2) = 690$.
$\bar t_{\text{Heath}} \approx \dfrac{690}{40} = 17.25\ \text{minutes}$.
$\text{Moorlands: }(5\times 16) + (15\times 8) + (25\times 14) + (35\times 12) = 970$.
$\bar t_{\text{Moorlands}} \approx \dfrac{970}{50} = 19.4\ \text{minutes}$.
c. The Heath has a smaller average waiting time: $17.25$ minutes versus $19.4$ minutes for Moorlands. Moorlands also has more patients in the higher time intervals ($20$–$40$ minutes).
d. $\text{The Heath}$, because the average waiting time is lower and a smaller proportion of patients wait in the longest intervals. (Answer based only on waiting times.)
Hank has a 6-sided spinner (1–6) and a 7-sided spinner (1–7). He spins both and adds the scores.
Score | Frequency |
2–4 | 6 |
5–7 | 3 |
8–10 | 8 |
11–13 | 3 |
Score | Frequency |
2–5 | 6 |
6–9 | 8 |
10–13 | 6 |
Table | Modal class | Median class | Estimate of mean |
A | 8–10 | 8–10 | \(\frac{3·6+6·3+9·8+12·3}{20}=\mathbf{7.2}\) |
B | 6–9 | 6–9 | \(\frac{3.5·6+7.5·8+11.5·6}{20}=\mathbf{7.5}\) |
6. The table shows the masses of $50$ meerkats.
$Mass,\ m\ (g)$ | $Frequency$ |
---|---|
$600 \le m < 650$ | $2$ |
$650 \le m < 700$ | $5$ |
$700 \le m < 750$ | $7$ |
$750 \le m < 800$ | $12$ |
$800 \le m < 850$ | $10$ |
$850 \le m < 900$ | $8$ |
$900 \le m < 950$ | $4$ |
$950 \le m < 1000$ | $2$ |
a. Write
i. the modal class interval
ii. the class interval where the median lies.
b. Work out an estimate for
i. the mean ii. the range.
Use the frequencies in the table at the start of the question to complete this table.
c. Zara decides to regroup the data, using larger group sizes. Copy and complete this table.
$Mass,\ m\ (g)$ | $Frequency$ |
---|---|
$600 \le m < 700$ | |
$700 \le m < 800$ | |
$800 \le m < 900$ | |
$900 \le m < 1000$ |
d. Write
i. the modal class interval ii. the class interval where the median lies.
e. Work out an estimate for
i. the mean ii. the range.
f. Compare your answers to parts a and b with your answers to parts d and e.
i. Do you think the answers in parts a and b or the answers in parts d and e are more accurate? Explain why.
ii. Were the answers in parts a and b or the answers in parts d and e quicker to work out? Explain why.
a.
i. Modal class interval: $750 \le m < 800$ (largest frequency $12$).
ii. Median class interval: $n=50$ so the $25$th and $26$th values lie in $750 \le m < 800$ (cumulative up to this class is $26$).
b.
i. Estimate of mean (midpoints $625,675,725,775,825,875,925,975$):
$\sum (x\! \times\! f)= 625\cdot2 + 675\cdot5 + 725\cdot7 + 775\cdot12 + 825\cdot10 + 875\cdot8 + 925\cdot4 + 975\cdot2 = 39900$.
$\bar m \approx \dfrac{39900}{50} = 798\ \text{g}$.
ii. Range $\approx 1000 - 600 = 400\ \text{g}$.
c. Regrouped frequencies:
$600 \le m < 700: 2+5 = 7$ $700 \le m < 800: 7+12 = 19$ $800 \le m < 900: 10+8 = 18$ $900 \le m < 1000: 4+2 = 6$.
d.
i. Modal class interval: $700 \le m < 800$ (frequency $19$).
ii. Median class interval: with $n=50$, the $25$th–$26$th values fall in $700 \le m < 800$ (cumulative totals $7,26,44,50$).
e.
i. Estimate of mean with regrouped classes (midpoints $650,750,850,950$):
$\sum (x\! \times\! f)=650\cdot7+750\cdot19+850\cdot18+950\cdot6=39800$.
$\bar m \approx \dfrac{39800}{50}=796\ \text{g}$.
ii. Range $\approx 1000 - 600 = 400\ \text{g}$ (unchanged).
f.
i. The answers from parts a and b are more accurate: the classes are narrower ($50\ \text{g}$ wide), so using midpoints gives a better approximation than the wider grouped table.
ii. The regrouped answers (parts d and e) are quicker to work out because there are fewer classes and fewer midpoint calculations.