Using statistics
Using statistics
You already know how to work out some statistical measures, such as the mode, median, mean and range.
The mode is the most common value or number.
If a set of data has two modes, it is called bimodal.
The median is the middle value when they are listed in order of increasing size.
The mean is the sum of all the values divided by the number of values.
The range is the largest value minus the smallest value.
In a real situation, you must decide which measure to use.
If you want to measure how spread out a set of measurements is, the range is the most useful statistic.
If you want to find a representative measurement, you need an average.
But should the average be the mode, the median or the mean? Which average to use depends on the particular situation.
Here is a summary to help you decide which average to choose:
1. Zaralia works for $20$ days each month. She records the time she waits in line for lunch each day in May. Here are the times, in minutes.
$2$ | $5$ | $3$ | $8$ | $5$ | $2$ | $10$ | $7$ | $8$ | $8$ |
$4$ | $7$ | $2$ | $2$ | $3$ | $6$ | $10$ | $3$ | $4$ | $7$ |
🧠 Tip: Start by writing the list of times in order from smallest to largest. Range $=\ \text{largest value} - \text{smallest value}$.
🧠 Tip: The month with the larger range has more variation in the waiting times.
a. Work out the:
i. $\text{mode}$ ii. $\text{median}$ iii. $\text{mean}$ time
b. Which average best represents the data? Give a reason for your choice of average.
c. Work out the range in Zaralia’s waiting times.
d. In June, Zaralia’s range in waiting times is $3$ minutes. In which month, May or June, is there more variation in her waiting times?
a.
Ordered data: $2,2,2,2,3,3,3,4,4,5,5,6,7,7,7,8,8,8,10,10$.
i. Mode $=2$ (appears $4$ times).
ii. Median $=\dfrac{5+5}{2}=5$.
iii. Mean $=\dfrac{106}{20}=5.3\ \text{minutes}$.
b. The median ($5$) best represents the data because the distribution is slightly skewed by larger values ($8$–$10$), so the mean is pulled upward.
c. Range $=10-2=8\ \text{minutes}$.
d. May has more variation (range $8$) than June (range $3$).
2. These are the ages, in years, of the members of a fitness class.
$57,\ 56,\ 51,\ 59,\ 51,\ 56,\ 58,\ 58,\ 51,\ 53,\ 50,\ 51,\ 54,\ 51$
a. Work out the:
i. $\text{mode}$ ii. $\text{median}$ iii. $\text{mean}$ age
b. Marcus and Arun discuss which average best represents the data. What do you think? Which average would you use? Give a reason for your choice.
c. Work out the range in ages of the members of the fitness class.
d. A different fitness class has a range in ages of $16$ years. Which fitness class, the first or the second, has less variation in ages of the members?
a.
Ordered data: $50,51,51,51,51,51,53,54,56,56,57,58,58,59$.
i. Mode $=51$ (most frequent).
ii. Median $=\dfrac{53+54}{2}=53.5$.
iii. Mean $=\dfrac{756}{14}=54$.
b. Use the mode ($51$). Nearly half the members are $51$, so the mode best describes the “typical” age; the mean ($54$) is higher because of a few older members.
c. Range $=59-50=9$ years.
d. The first class (range $9$) has less variation than the second class (range $16$).
Days of rain in first week of May (over 35 years)
Days of rain (d) | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
Frequency (f) | 13 | 7 | 5 | 2 | 0 | 3 | 2 | 3 |
Mode: Greatest frequency is 13 at 0 days → mode = 0 days.
Median: There are 35 years, so the median position is \((35+1)/2 = 18\). Cumulative frequencies up to 0 days = 13; up to 1 day = 20. The 18th value lies in the 1-day category → median = 1 day.
Mean: Compute \(\sum d\times f\): \(0·13=0,\;1·7=7,\;2·5=10,\;3·2=6,\;4·0=0,\;5·3=15,\;6·2=12,\;7·3=21\). Total days \(= 0+7+10+6+0+15+12+21 = \mathbf{71}\). Mean \(= 71/35 = \mathbf{2.03}\) days (2 d.p.).
b) The data are skewed towards small numbers of rainy days (many 0s and 1s; a few large values). The median (1 day) is a better “typical” value because the mean (2.03) is pulled upward by the few wet years. So I’d choose the median to represent this distribution.
c) Algebra used: \( \sum f = 35\) to confirm the total number of years; median position \(= (n+1)/2\); mean \(= \dfrac{\sum d f}{\sum f} = \dfrac{71}{35}\). Forming and evaluating the products \(d f\) is the key algebraic step.
4. This table shows the number of men’s belts sold in a store during one month.
$Length\ (cm)$ | $80$ | $85$ | $90$ | $95$ | $100$ | $105$ | $110$ | $115$ |
---|---|---|---|---|---|---|---|---|
$Frequency$ | $6$ | $16$ | $28$ | $41$ | $17$ | $18$ | $10$ | $13$ |
Use an appropriate average to decide which size of belt the store owner should always try to keep in stock.
5. Arun records the number of people in $60$ passing cars. Here are his results.
$Number\ of\ people$ | $1$ | $2$ | $3$ | $4$ | $5$ | $6$ |
---|---|---|---|---|---|---|
$Frequency$ | $28$ | $3$ | $6$ | $2$ | $1$ |
Remember: You can say $“$the mode is $28$” or $“$the modal value is $28$”. They mean the same thing.
a. Find the missing frequency.
b. Arun says: I think the modal number of people per car is $28$ because $28$ is the largest frequency.
Explain the mistake that Arun has made.
c. How can you tell, by looking at the table, that the median is $2$ people per car?
d. Arun works out that the mean is $1.95$ people per car. Show that Arun is correct.
e. Which average best represents the data? Give a reason for your choice of average.
a. Total cars $=60$. Missing frequency for $2$ people $=60-(28+3+6+2+1)=60-40=20$.
b. The mode is the value with the greatest frequency, not the frequency itself. Since the greatest frequency is $28$ for $1$ person, the modal number of people per car is $1$ (not $28$).
c. Cumulative frequencies: $1$ person $\to 28$ cars; $2$ people $\to 28+20=48$. The middle positions (the $30$th and $31$st of $60$) fall within the $2$-people group, so the median is $2$.
d. $\dfrac{1\cdot28+2\cdot20+3\cdot3+4\cdot6+5\cdot2+6\cdot1}{60}=\dfrac{28+40+9+24+10+6}{60}=\dfrac{117}{60}=1.95$.
e. The median ($2$) best represents the data: the distribution is skewed with a long tail to higher counts, so the median is a stable central value. (The mode is $1$, but median better reflects typical occupancy considering the tail.)
6. A test has ten questions. A total of $120$ students take the test. The table shows the students’ test scores.
$Questions\ answered\ correctly$ | $4$ | $5$ | $6$ | $7$ | $8$ | $9$ | $10$ |
---|---|---|---|---|---|---|---|
$Frequency$ | $3$ | $5$ | $12$ | $13$ | $17$ | $30$ | $40$ |
a. How many students scored:
i. more than the median?
ii. more than the mode?
iii. more than the mean?
b. Which average best represents the data? Give a reason for your choice of average.
a.
Median: with $n=120$, the $60^{\text{th}}$ and $61^{\text{st}}$ values lie in score $9$ (cumulatives: up to $8$ is $50$, up to $9$ is $80$), so median $=9$.
i. More than the median $\Rightarrow$ scores $>9$: only $10 \Rightarrow 40$ students.
Mode: the largest frequency is $40$ at score $10$, so mode $=10$.
ii. More than the mode $\Rightarrow$ scores $>10$: $0$ students.
Mean: $\dfrac{4\cdot3+5\cdot5+6\cdot12+7\cdot13+8\cdot17+9\cdot30+10\cdot40}{120}=\dfrac{1006}{120}\approx 8.383\ldots$
iii. More than the mean $\Rightarrow$ scores $\ge 9$: $30+40=70$ students.
b. The median ($9$) best represents the data: it indicates a central score without being affected by the ceiling at $10$ or by the lower outliers, while a modal score of $10$ would overstate typical performance.
7 Work with a partner or in a small group to answer this question.
You are going to roll two dice and add the numbers on the dice to give the score.
For example, if you roll these numbers, you get a score of 7.
a) What is the smallest score you can get?
b) What is the largest score you can get?
You are going to roll the dice 40 times.
c) Draw a table ready to record the scores you will get.
Your table needs to have a ‘Tally’ column and a ‘Frequency’ column.
Score | Tally | Frequency |
2 | ||
3 | ||
4 | ||
… | ||
12 |
d) Now roll the dice 40 times and record all your scores. When you have finished, make sure your frequency column adds up to 40.
e) For your set of data, work out the:
f) Which average best represents your data? Give a reason for your choice of average.
g) Compare your data and averages with those of other learners in your class.
Do you have different averages? Do you have the same averages? Discuss why.
You’ve just refreshed the definitions of mode, median, mean and range. Next, you’ll use them to summarise and compare real data sets — and justify which statistic is most appropriate.
Quick checks
Mini example
Ages: 16, 17, 18, 18, 19, 20, 20, 21, 21, 32, 41 → mode = 18, 20, 21 · median = 20 · mean = 243÷11 ≈ 22.1 · range = 41−16 = 25.
Best average? Median (mean is skewed by two older players).
Tip: When comparing two groups, pair an average (mean/median) with the range for a fuller story: “Group A is higher on average, Group B is more consistent.”
You can use an average to summarise a set of data. This could be the mode, median or mean.
You can use the range to measure the spread of the data. The larger the range, the more varied the data.
You already know how to work out the mode, median, mean and range. Here is a reminder:
You can use these statistics to compare two or more sets of data.
1. In the 2010 football World Cup, Spain won and Brazil was knocked out in the quarter finals. The numbers of goals they scored in their matches are shown.
Spain: $0,2,2,1,1,1,1$
Brazil: $2,3,0,3,1$
a. Work out the mean score for each team.
b. Use the means to state which team scored more goals, on average, per match.
c. Work out the range for each team.
d. Use the ranges to state which team’s scores were more varied.
a. Spain: $\dfrac{0+2+2+1+1+1+1}{7}=\dfrac{8}{7}\approx1.14$ goals. Brazil: $\dfrac{2+3+0+3+1}{5}=\dfrac{9}{5}=1.8$ goals.
b. Brazil scored more goals on average.
c. Spain: $2-0=2$. Brazil: $3-0=3$.
d. Brazil’s scores were more varied (larger range).
2. A teacher measured the heights of two groups of children.
Group A: $84,73,89,80,77$ cm
Group B: $77,85,75,69,82,67,72$ cm
a. For each group:
i. write the heights in order of size
ii. write the median height
iii. work out the range in heights.
b. Use the medians to state which group is taller, on average.
c. Use the ranges to state which group’s heights are less varied.
a.
Group A ordered: $73,77,80,84,89$. Median $=80$. Range $=89-73=16$.
Group B ordered: $67,69,72,75,77,82,85$. Median $=75$. Range $=85-67=18$.
b. Group A is taller on average (median $80$ vs $75$).
c. Group A’s heights are less varied (range $16$ vs $18$).
3. The maximum daytime temperature $(^\circ C)$ was recorded in Madrid and Cartagena during one week in August.
Madrid: $38,34,36,32,35,37,36$
Cartagena: $30,32,29,30,28,30,33$
a. For each city:
i. write the temperatures in order of size
ii. write the modal temperature
iii. work out the range in temperatures.
b. Use the modes to state which city is hotter, on average.
c. Use the ranges to state which city’s temperatures are more varied.
a.
Madrid ordered: $32,34,35,36,36,37,38$. Mode $=36$. Range $=38-32=6$.
Cartagena ordered: $28,29,30,30,30,32,33$. Mode $=30$. Range $=33-28=5$.
b. Madrid is hotter on average (mode $36$ vs $30$).
c. Madrid’s temperatures are more varied (range $6$ vs $5$).
4. A nurse measured the total mass of $20$ baby boys as $64$ kg. The total mass of $15$ baby girls was $51$ kg. Which babies were heavier on average, the boys or the girls? Give a reason for your answer.
Boys’ mean mass $=\dfrac{64}{20}=3.2\ \text{kg}$. Girls’ mean mass $=\dfrac{51}{15}=3.4\ \text{kg}$. Girls were heavier on average.
5. The test marks of two groups of students are shown.
Maths: 77, 89, 75, 80, 80, 91, 78, 76, 76, 76
Science: 72, 79, 77, 87, 81, 62, 75, 87
a) Copy and complete this table.
Mean | Median | Mode | Range | |
Maths | 79.8 | 77.5 | 76 | 16 |
Science | 77.5 | 78 | 87 | 25 |
(Mean values rounded to 1 d.p.)
b) In which group, Maths or Science, do you think the students did better on average?
c) In which group, Maths or Science, do you think the students had more consistent scores?
d) Compare your answers to parts b and c with those of other learners in the class. Discuss these questions.
i) Which average did you use to compare the scores? Why did you use this average? Why did you not use the other averages?
ii) What does ‘more consistent’ mean? What statistic did you use to decide which group had more consistent scores?
i) The mean is a good choice because it uses all the data. The median is also reasonable and is less affected by the low outlier 62 in Science. The mode is less helpful here because each list is fairly spread and a single most common value doesn’t summarise overall performance well.
ii) ‘More consistent’ means the scores are closer together (smaller spread). A simple measure is the range; Maths has the smaller range (16 vs 25), so it is more consistent.
e) Now you have discussed the answers of other learners in your class, which average do you think is the best to use to compare these scores? Explain why.
6. Nialls recorded the temperatures in two experiments.
Experiment | Temperatures $(^\circ C)$ |
---|---|
First experiment | $29, 28, 21, 33, 30$ |
Second experiment | $28, 29, 28, 33, 32, 31, 32, 29$ |
a. Work out the mean, median and range for each experiment.
b. State whether each of these statements is True (T) or False (F). Justify your answers.
i. The temperatures in the first experiment are higher, on average, than the temperatures in the second experiment.
ii. The temperatures in the first experiment are more varied than the temperatures in the second experiment.
c. Is it possible to work out the modal temperature for each experiment? Explain your answer.
a.
First experiment: Mean $=\dfrac{29+28+21+33+30}{5}=\dfrac{141}{5}=28.2$. Median $=29$. Range $=33-21=12$.
Second experiment: Mean $=\dfrac{28+29+28+33+32+31+32+29}{8}=\dfrac{242}{8}=30.25$. Median $=(29+31)/2=30$. Range $=33-28=5$.
b.
i. False. First experiment mean $=28.2$, second experiment mean $=30.25$.
ii. True. First experiment range $=12$, second experiment range $=5$.
c. Yes. First experiment has no repeated values, so no mode. Second experiment has a mode ($28$ and $32$, both appear twice) so it is bimodal.
7. Work with a partner or in a small group to answer this question.
You are going to roll two dice and subtract the numbers on the dice to give a score. Always subtract to give a positive, or zero, score (use the difference).
Tip: For example, if you roll a 6 and a 1, the score is 5.
a) What is the smallest score you can get?
b) What is the largest score you can get?
You are going to roll the dice 40 times.
c) Draw a table to record the scores you get. Your table needs a ‘Tally’ column and a ‘Frequency’ column.
Score | Tally | Frequency |
0 | ||
1 | ||
2 | ||
3 | ||
4 | ||
5 |
d) Now roll the dice 40 times and record all your scores. When you have finished, make sure the frequency column adds up to 40.
e) Work out for your data:
Theoretical (fair dice) expectations: mode \(=1\), median \(=2\), mean \(\approx 1.94\).
f) Which average best represents your data? Give a reason for your choice of average.
g) Compare your data and averages with other learners in the class. Do you have different averages? Do you have the same averages? Have you chosen the same average to represent your data? Discuss your answers.
8. The frequency tables show the number of goals scored in each match by two hockey teams in 20 matches.
Team A Number of goals |
$0$ | $1$ | $2$ | $3$ | $4$ | $5$ |
---|---|---|---|---|---|---|
Frequency | $4$ | $1$ | $4$ | $2$ | $4$ | $5$ |
Team B Number of goals |
$0$ | $1$ | $2$ | $3$ | $4$ | $5$ |
---|---|---|---|---|---|---|
Frequency | $0$ | $6$ | $1$ | $5$ | $4$ | $4$ |
a. Show that Marcus, Zara and Arun say could all be correct.
b. Which average do you think best represents the data in the tables? Explain why. Who do you agree with, Marcus, Zara or Arun?
a.
Team A: Mean $=\dfrac{0\cdot4+1\cdot1+2\cdot4+3\cdot2+4\cdot4+5\cdot5}{20}=\dfrac{60}{20}=3$ goals.
Team B: Mean $=\dfrac{0\cdot0+1\cdot6+2\cdot1+3\cdot5+4\cdot4+5\cdot4}{20}=\dfrac{56}{20}=2.8$ goals.
So, Marcus could argue for Team A (mean higher).
Median Team A: The middle two scores are around $3$. Median $=3$.
Median Team B: The middle scores are $3$. Median $=3$. So Arun is correct—they have the same median.
Mode Team A: $5$ goals (highest frequency $5$). Mode Team B: $1$ goal (highest frequency $6$). So Zara could argue Team B, but it depends on interpretation of "average": she might be using mode incorrectly (Team B has mode $=1$, fewer goals). Alternatively, Team B’s distribution could justify another interpretation.
b. The mean best represents performance over many matches. On this basis, Team A ($3$ goals per match) did slightly better than Team B ($2.8$ goals per match). I would agree with Marcus.