Here is a prediction: Newborn baby boys are heavier than newborn baby girls.
How could you investigate whether this prediction is true?
It would be very difficult to find the masses of all the babies born. You could find the masses of some of the babies born. This would be a sample of the whole population.
The population, in this case, is all newborn babies. The sample is the group of babies you choose.
If you can, it is best to get information from the whole population. However, this may take too long or cost too much. In such cases, you can choose a sample. The sample should not be too small or it will not represent the whole population.
In Worked example you will see different ways to choose a sample.
1. Wei is investigating at her school how many hours of homework the learners in her year do each evening. She predicts that most learners do more than $2$ hours each evening.
a. How can she collect data to test this prediction?
b. Give a reason why it is easier to use a sample than the whole year group.
c. What data does she need to collect?
Here are the results of a question given to $25$ learners.
| Homework | Less than $1$ hour | Between $1$ and $2$ hours | Between $2$ and $3$ hours | More than $3$ hours |
|---|---|---|---|---|
| Frequency | $3$ | $6$ | $11$ | $5$ |
d. Show the results in a suitable chart.
e. What can you say about Wei’s prediction?
2. Sofia is investigating birthdays of young people. She predicts that birthdays in autumn are more common than birthdays in other seasons.
a. Why is it not possible to collect data from the whole population?
b. What data does she need? How can she collect the data?
c. She starts to write down the birthday month of each learner in a list like this: March, October, December, April, … Explain why this is not a good way to record the data. Suggest a better way.
Sofia displays her results in a table, as shown.
| Season | Spring | Summer | Autumn | Winter |
|---|---|---|---|---|
| Frequency | $200$ | $170$ | $230$ | $220$ |
d. What is the size of the sample?
e. What can you say about Sofia’s prediction?
3. A company investigates the success of a telephone helpline. A survey of callers using the telephone helpline are asked the question:

a. What prediction is this question testing?
b. What is an advantage of asking the question in this way?
c. The population is all the callers who use the helpline. Why will the survey only be a sample?
The table summarises the scores received in one day.
| Score | $1$ | $2$ | $3$ | $4$ | $5$ |
|---|---|---|---|---|---|
| Frequency | $10$ | $12$ | $6$ | $1$ | $8$ |
d. What can you say about your prediction in part a? Give a reason for your answer.
4. Dakarai is comparing two books: A and B. He predicts that book A has longer words than book B.
a. What are the two populations here?
b. How could he choose the page each time?
c. Describe how he can collect the data.
d. Describe a chart he can use to display the data.
e. Dakarai wants to find the average length of the words on each page. What is the best average to use? Give a reason for your answer.
f. How can he use the average to see if his prediction is correct?
g. Do you think the sample is large enough to be sure that he has the correct answer to his prediction?
5. Suki has a dice. She predicts that the dice is not fair. To test her prediction Suki throws the dice $20$ times. Here are the results.
| Score | $1$ | $2$ | $3$ | $4$ | $5$ | $6$ |
|---|---|---|---|---|---|---|
| Frequency | $4$ | $3$ | $1$ | $4$ | $5$ | $3$ |
a. What can you say about Suki’s prediction?
Suki decides to throw the dice $100$ times. Here are the results.
| Score | $1$ | $2$ | $3$ | $4$ | $5$ | $6$ |
|---|---|---|---|---|---|---|
| Frequency | $17$ | $19$ | $16$ | $14$ | $11$ | $23$ |
b. What can you say about Suki’s prediction now?
Suki goes on to throw the dice $500$ times. Here are the results.
| Score | $1$ | $2$ | $3$ | $4$ | $5$ | $6$ |
|---|---|---|---|---|---|---|
| Frequency | $93$ | $92$ | $83$ | $74$ | $48$ | $110$ |
c. Do these results confirm your conclusion in part b? Give a reason for your answer.
d. Is there any benefit in Suki doing more trials? Give a reason for your answer.
6. You may choose to work with a partner on this question. Hospital management wants to know what patients think of the emergency service provided by the hospital. The management decides to employ a company to carry out a survey of patients.
a. Why is a large sample better than a small sample?
b. What are the disadvantages of a large sample size?
c. Write two survey questions you could ask patients about the amount of time they waited before they were treated.
d. For each question, describe how you would analyse the answers.
e. Compare your questions with other learners’ questions. Can you suggest improvements to your questions or their questions?
Scenario: The staff at a theatre want to know more about their customers. They want to find out:
Tasks:
a. Example predictions:
b. Contact methods:
c. Three suitable questions:
d. Analysis:
The word ‘population’ usually refers to the people living in a town or country. In a statistical investigation, however, it means the people you are interested in.
If you are investigating your class, the population means the people in your class. If you are investigating your school, the population is all the learners in your school.
Often you cannot question the whole population. In this case, you need to choose a sample. There are different ways to choose a sample. In any investigation you need to decide on the best way to choose your sample.
Sometimes, your investigation is not about people. For example, you might be investigating the traffic going past your school. If you collect data about some of the vehicles you are still taking a sample. In this case, the population is all the vehicles passing your school.
1. A manager wants to find customers’ opinions about his shop. The manager wants to choose a sample of $50$ customers.
a. The sample could be the first $50$ customers in the shop after it opens.
i. Write one advantage of this method.
ii. Write one disadvantage of this method.
b. The sample could be $10$ customers chosen at random every $2$ hours until $50$ have been chosen.
i. Write one advantage of this method.
ii. Write one disadvantage of this method.
c. The manager thinks that the opinions of men and women could be different. Explain how he should choose the sample to take account of this.
d. Can you think of another factor that might affect customers’ opinions?
a.
i. Advantage: Quick and easy to collect, as no extra selection is needed.
ii. Disadvantage: May not be representative (e.g., only early shoppers).
b.
i. Advantage: Spreads the sample across the day, making it more representative.
ii. Disadvantage: Takes more time and effort to organise and monitor.
c. He should ensure an equal or proportional number of men and women are included in the sample.
d. Other factors could include age, income, or frequency of visiting the shop.
Task: You are going to find the lengths of the words in a novel. Choose a book to use. You want a sample of $50$ words.
Steps:
a. Three methods could be:
b. Example: Using the chapter sampling method, record each word length in a tally chart.
c. The sample may not be fully representative if only one part of the book was used. Improving the method could involve sampling across more chapters.
d. Trying the systematic method (e.g., every 10th word) might give a different spread of word lengths.
e. Comparing methods: The random method may be less reliable than systematic or stratified (by chapters). The second method can be improved by ensuring words are spread across the book. Stratified sampling across chapters is usually better.
3. Zalika is investigating the number of people in each car on a busy road. She predicts that most cars will contain only one person, the driver. Zalika says, ‘I will start at $08{:}00$ and observe $200$ cars.’
a. Write one advantage and one disadvantage of Zalika’s method.
b. Describe a better way to take a sample of $200$ cars. Explain why your method is better than Zalika’s.
a.
b. A better way is to observe cars at different times of the day (morning, afternoon, evening) to get a more representative sample. This is better than Zalika’s method because it avoids bias caused by only collecting data during rush hour.
4. You have been asked to carry out an investigation. You want to find out if learners would like to change the school homework policy. You will choose a sample of about $50$ learners.
Explain how you will choose your sample. Explain why you have chosen this method.
5. Arun and Sofia carry out a survey of parents about school homework. One conjecture is that parents want more homework.
To test this, each asks this question of a sample of $50$ parents:
Is the amount of homework your child gets too little / about right / too much ? (choose one)
This chart shows the results:

a. Do Arun’s results support this conjecture? Give a reason for your answer.
b. Do Sofia’s results support this conjecture? Give a reason for your answer.
c. Give a possible reason why the results of the two surveys are different.
a. No. Arun’s highest bar is for “About right,” not “Too little,” so his results do not support the conjecture that parents want more homework.
b. Yes. Sofia’s highest bar is “Too little,” so her results support the conjecture.
c. They sampled different parents (e.g., different classes/schools) or at different times, so the samples were different and gave different results.
“I will give a questionnaire to the first $50$ workers visiting the restaurant this lunchtime.”
6. A large factory has a restaurant where employees go for lunch. Arun is investigating ways to improve the restaurant.
a. Give one disadvantage of Arun’s method.
b. Describe a better way of doing the survey.
a. It’s a convenience sample (the first $50$ at one lunchtime), so it’s biased and not representative (e.g., only early lunchers/one shift).
b. Use a random or stratified random sample across different times/days and departments/shifts (e.g., randomly select $10$ workers per major department over several lunchtimes) so all employees have an equal chance to be included.
The mode is the number with the highest frequency.
7. Here is a conjecture about the cars using a particular road:
The modal number of people in a car is $1$.
Marcus, Zara and Sofia each do a survey of cars on the road.
They count the number of people in each car, including the driver.
Each person does the survey for $15$ minutes.
They do their surveys at different times of day.
The results are in the graph on the right.

a. Do the results of each survey support this conjecture?
b. Describe any similarities or differences between the surveys.
c. Why do the samples give different results?
a. Marcus and Zara support the conjecture: their modal category is $1$ person per car. Sofia does not support it: her highest frequency is at $2$ people.
b. All three show frequencies generally decreasing as the number of people increases. Marcus counted many more cars overall (higher frequencies). Sofia’s peak is at $2$ people, while Marcus and Zara peak at $1$.
c. They surveyed at different times of day, so traffic composition differed (e.g., school runs/carpooling vs. solo commuting). Different sample sizes also contribute.
8. An examiner has marked $250$ examination papers.
To check the accuracy of her marking, a sample of $10$ papers will be re-marked.
a. Describe three different ways of choosing the sample.
b. Which of the three ways do you think is the best? Explain why you think so.
a. Examples: (i) simple random sample of $10$ papers from the $250$; (ii) systematic sample (e.g., every $25^\text{th}$ paper after a random start); (iii) stratified sample by mark bands (e.g., low/medium/high scores) with proportional selection.
b. A simple random sample is usually best for an unbiased accuracy check because every paper has equal chance of selection, avoiding deliberate or accidental choice bias. If mark distribution is very uneven, a stratified random sample by score bands may be better to ensure representation.
How is height related to other body measurements? This is something you can investigate by collecting data. First you need to ask some statistical questions, for example:
When you have some questions, you can make predictions to test, for example:
To test your predictions, you need to think about the data you want to collect. For prediction $1$, height could be continuous data if you use a tape measure, or it could be categorical data if you decide to classify people as short, average or tall. Shoe sizes are discrete numerical data. For prediction $3$, you will need to decide how to collect the measurements. To measure the length of an arm or a leg might make people uncomfortable.
You will need a sample. You must think about different ways to choose a sample and the best method to use. It is a good idea to test your data collection method in a small trial. You might want to change your design after you have done this.
Each question in this exercise is about planning a statistical investigation. It is a good idea to work on each question in pairs.
Example generalisations:
“Older learners are better at estimating.”
or “Girls are better at estimating than boys.”
1. You are going to investigate the ability of learners in your school to estimate. This could be the ability to estimate the length of a line, the size of an angle, the number of items in a jar, a particular length of time, or something else.
a. Write some questions you could ask about estimation.
b. Write some predictions you could test.
c. Describe some different ways of choosing a sample to test one or more of your predictions.
d. Which sample method is best? Give a reason for your answer.
e. Carry out a small trial of your investigation. Can you think of ways to improve your investigation?
f. Use the results of your trial to make a generalisation about learners’ ability to estimate.
a. Example: “How many sweets are in the jar?” or “How long is this line/angle?”
b. Predictions: “Older students will estimate more accurately,” “Girls will be better at estimating than boys.”
c. Sampling methods: random sample, stratified sample by year group, convenience sample (e.g., first $30$ students available).
d. Stratified random sampling is best because it represents different groups fairly.
e. A trial can reveal unclear instructions or overly difficult estimation tasks. Improvements might include clearer scales or simpler tasks.
f. Generalisations could be based on observed differences between groups (e.g., “Year 11 students estimate more accurately than Year 7 students”).
2. You are going to investigate the attitudes of learners to the structure of the school day. Here are some things you could think about: the length of lessons; the number of lessons in a day; breaks; start and finish times. You might think of other areas of interest.
Example generalisation:
“Learners think lessons should be longer.”
a. Write some questions you could ask about the structure of the school day.
b. Write some predictions you can test.
c. Describe some different ways of choosing a sample to test one or more of your predictions.
d. Which sample method is best? Give a reason for your answer.
e. Carry out a small trial of your investigation. Can you think of ways to improve your investigation?
f. Use the results of your trial to make a generalisation about learners’ attitudes to the structure of the school day.
a. Example questions: “How long should lessons last?” “Do you prefer fewer longer lessons or more shorter ones?”
b. Predictions: “Learners prefer longer lessons.” “Learners prefer more breaks.”
c. Sampling methods: random sample across year groups, stratified sample by age, convenience sample from a single class.
d. Stratified random sample is best because it represents all year groups and genders fairly.
e. A trial may reveal unclear questions or missing options. Improvements could include simplifying questions or offering multiple-choice answers.
f. Example generalisation: “Most learners prefer longer breaks but shorter lessons.”
3. You are going to investigate news articles. The articles could be in newspapers or online. You could investigate readability, length, vocabulary or other aspects.
a. Write some questions you could ask about news articles.
b. Write some predictions you can test.
c. Describe some different ways of choosing a sample to test one or more of your predictions.
d. Which sample method is best? Give a reason for your answer.
e. Carry out a small trial of your investigation. Can you think of ways to improve your investigation?
f. Use the results of your trial to make a generalisation about news articles.
a. Example questions: “How many words are in an article?” “What is the reading age required?” “How many difficult words are used?”
b. Predictions: “Online articles are shorter than newspaper articles.” “Sports articles use simpler vocabulary than political articles.”
c. Sampling methods: random sample of $20$ online and $20$ newspaper articles, stratified by topic, or systematic sample (every $5^\text{th}$ article from a list).
d. Stratified random sample is best because it ensures fair comparison across topics and sources.
e. A trial might show that some articles are too long to analyse. Improvements could be reducing sample size or focusing on particular sections.
f. Example generalisation: “Online articles tend to be shorter and use simpler vocabulary than print articles.”