chevron_backward

Taking samples

chevron_forward
visibility 38update 6 months agobookmarkshare

🎯 In this topic you will

  • Learn about taking a sample
  • Learn about the effect of sample size
  • Understand the advantages of different sampling methods
  • Plan how to collect statistical data to test a set of predictions
  • Use data to make inferences and generalisations
  • Look at alternative ways to choose a sample and decide which method is best to use
 

🧠 Key Words

  • population
  • sample
  • sample size
Show Definitions
  • population: The whole group of individuals or items being studied in a statistical investigation.
  • sample: A smaller group selected from the population, used to represent the whole in an investigation.
  • sample size: The number of individuals or items included in a sample.
 

🔍 Investigating a Prediction

Here is a prediction: Newborn baby boys are heavier than newborn baby girls.

How could you investigate whether this prediction is true?

📊 Using a Sample

It would be very difficult to find the masses of all the babies born. You could find the masses of some of the babies born. This would be a sample of the whole population.

The population, in this case, is all newborn babies. The sample is the group of babies you choose.

⏳ Whole Population vs Sample

If you can, it is best to get information from the whole population. However, this may take too long or cost too much. In such cases, you can choose a sample. The sample should not be too small or it will not represent the whole population.

📘 Looking Ahead

In Worked example you will see different ways to choose a sample.

 
📘 Worked example

Arun is investigating how long learners in his school can hold their breath. He makes a prediction: Boys can hold their breath longer than girls.

a. What is the population in this investigation?

b. Why do you think he uses a sample instead of the whole school?

c. Explain the different ways he can choose the sample.

d. What data must Arun collect?

e. How can he analyse the results?

Answer:

a. The population is all the learners in the school.

b. Testing the whole school would take a long time and may not be practical.

c. Arun could, for example:

  • Put names in a hat and select $40$ learners.
  • Choose the names from the list of learners in each class.
  • Select one or two classes and test all the learners.

d. Arun must ask each learner to hold their breath and time it. He must also record whether each learner is a boy or a girl.

e. Arun can analyse the boys’ data and girls’ data separately. He can draw a chart and find an average for the boys’ data and an average for the girls’ data. He can then compare the averages.

Explanation:

In statistics, a population is the whole group being studied — here, all learners in the school.

Using a sample saves time and resources compared to testing everyone.

Sampling methods can include random selection, stratified selection by class, or using a whole class as a group.

The data must include both the measured times and the gender of each learner, so comparisons are meaningful.

By calculating and comparing averages, Arun can check whether the evidence supports his prediction.

 

EXERCISES

1. Wei is investigating at her school how many hours of homework the learners in her year do each evening. She predicts that most learners do more than $2$ hours each evening.

a. How can she collect data to test this prediction?

b. Give a reason why it is easier to use a sample than the whole year group.

c. What data does she need to collect?

Here are the results of a question given to $25$ learners.

Homework Less than $1$ hour Between $1$ and $2$ hours Between $2$ and $3$ hours More than $3$ hours
Frequency $3$ $6$ $11$ $5$

d. Show the results in a suitable chart.

e. What can you say about Wei’s prediction?

👀 Show answer
a. She can collect data by asking learners how many hours of homework they do each evening. b. Using a sample is easier because it saves time and effort compared to questioning the whole year group. c. She needs to collect the number of hours each learner spends on homework. d. The results could be shown as a bar chart or pie chart. e. Most learners in the sample (those with $2$–$3$ hours and more than $3$ hours) did more than $2$ hours, so Wei’s prediction is supported.

2. Sofia is investigating birthdays of young people. She predicts that birthdays in autumn are more common than birthdays in other seasons.

a. Why is it not possible to collect data from the whole population?

b. What data does she need? How can she collect the data?

c. She starts to write down the birthday month of each learner in a list like this: March, October, December, April, … Explain why this is not a good way to record the data. Suggest a better way.

Sofia displays her results in a table, as shown.

Season Spring Summer Autumn Winter
Frequency $200$ $170$ $230$ $220$

d. What is the size of the sample?

e. What can you say about Sofia’s prediction?

👀 Show answer
a. It is not possible to collect data from all young people everywhere. b. She needs the birthday months of a representative sample of learners in her school, collected by a questionnaire or survey. c. Writing birthdays in a list is hard to analyse. A frequency table is a better way. d. The size of the sample is $200 + 170 + 230 + 220 = 820$. e. Autumn has the highest frequency ($230$), so Sofia’s prediction is supported.
 

EXERCISES

3. A company investigates the success of a telephone helpline. A survey of callers using the telephone helpline are asked the question:

a. What prediction is this question testing?

b. What is an advantage of asking the question in this way?

c. The population is all the callers who use the helpline. Why will the survey only be a sample?

The table summarises the scores received in one day.

Score $1$ $2$ $3$ $4$ $5$
Frequency $10$ $12$ $6$ $1$ $8$

d. What can you say about your prediction in part a? Give a reason for your answer.

👀 Show answer
a. The prediction being tested is whether callers find the service helpful. b. A single question with a scale gives clear, comparable answers that are easy to record and analyse. c. The survey is only a sample because not every caller can be asked, only those surveyed on that day. d. The majority of callers circled $1$, $2$, or $5$. This suggests mixed views, with many not finding it helpful ($1$ and $2$), and some very helpful ($5$). The prediction is only partly supported.

4. Dakarai is comparing two books: A and B. He predicts that book A has longer words than book B.

a. What are the two populations here?

b. How could he choose the page each time?

c. Describe how he can collect the data.

d. Describe a chart he can use to display the data.

e. Dakarai wants to find the average length of the words on each page. What is the best average to use? Give a reason for your answer.

f. How can he use the average to see if his prediction is correct?

g. Do you think the sample is large enough to be sure that he has the correct answer to his prediction?

👀 Show answer
a. The two populations are all the words in book A and all the words in book B. b. He could choose a random page each time, e.g. using a random number generator. c. He can count the number of letters in every word on the chosen pages. d. He can display the data in a bar chart, histogram, or box plot. e. The mean word length is the best average, as it uses all the data values. f. He can compare the mean lengths for book A and book B to see which is higher. g. One page from each book is not enough; the sample should include several pages to ensure the results are reliable.
 

EXERCISES

5. Suki has a dice. She predicts that the dice is not fair. To test her prediction Suki throws the dice $20$ times. Here are the results.

Score $1$ $2$ $3$ $4$ $5$ $6$
Frequency $4$ $3$ $1$ $4$ $5$ $3$

a. What can you say about Suki’s prediction?

Suki decides to throw the dice $100$ times. Here are the results.

Score $1$ $2$ $3$ $4$ $5$ $6$
Frequency $17$ $19$ $16$ $14$ $11$ $23$

b. What can you say about Suki’s prediction now?

Suki goes on to throw the dice $500$ times. Here are the results.

Score $1$ $2$ $3$ $4$ $5$ $6$
Frequency $93$ $92$ $83$ $74$ $48$ $110$

c. Do these results confirm your conclusion in part b? Give a reason for your answer.

d. Is there any benefit in Suki doing more trials? Give a reason for your answer.

👀 Show answer
a. With only $20$ throws, the results do not give strong evidence. The distribution could occur by chance, so we cannot conclude the dice is unfair. b. After $100$ throws, the results are closer to what we expect for a fair dice, but still not enough to decide. c. After $500$ throws, the frequencies are much more even, suggesting the dice is fair. d. More trials improve reliability by reducing the effect of chance variations, so repeating is beneficial.

6. You may choose to work with a partner on this question. Hospital management wants to know what patients think of the emergency service provided by the hospital. The management decides to employ a company to carry out a survey of patients.

a. Why is a large sample better than a small sample?

b. What are the disadvantages of a large sample size?

c. Write two survey questions you could ask patients about the amount of time they waited before they were treated.

d. For each question, describe how you would analyse the answers.

e. Compare your questions with other learners’ questions. Can you suggest improvements to your questions or their questions?

👀 Show answer
a. A large sample is more representative and reduces the effect of anomalies. b. Large samples take more time and resources to collect and analyse. c. Example questions: “How many minutes did you wait before being seen?”; “Were you satisfied with the waiting time? Rate 1–5.” d. Analyse numerical responses using averages; analyse satisfaction ratings with frequency tables or bar charts. e. Suggestions may include making questions clearer, avoiding bias, or including answer ranges for easier analysis.
 

🧠 Think like a Mathematician

Scenario: The staff at a theatre want to know more about their customers. They want to find out:

  • How frequently customers come to the theatre
  • If the theatre is attracting people of different ages
  • What customers enjoy about the theatre
  • How the staff could improve their service

Tasks:

a. Write down two predictions you could test.
b. How could you contact people to complete the questionnaire?
c. Write three suitable questions.
d. Describe how you will analyse the results.
👀 show answer

a. Example predictions:

  • Younger customers (under 25) attend less frequently than older customers.
  • Most customers are satisfied with the theatre facilities.

b. Contact methods:

  • Email surveys to ticket holders
  • QR code on tickets linking to the questionnaire
  • Paper forms handed out at performances

c. Three suitable questions:

  • How many times have you attended the theatre in the last 12 months?
  • Which age group do you belong to? (Under 18, 18–25, 26–40, 41–60, 60+)
  • What do you most enjoy about your theatre visits? (e.g., performances, facilities, atmosphere)

d. Analysis:

  • Tally and tabulate responses for each question.
  • Draw bar charts or pie charts for attendance and age group data.
  • Summarise common themes from open-ended responses.
  • Compare frequency of visits across different age groups to test predictions.
 

👥 Understanding Population in Statistics

The word ‘population’ usually refers to the people living in a town or country. In a statistical investigation, however, it means the people you are interested in.

If you are investigating your class, the population means the people in your class. If you are investigating your school, the population is all the learners in your school.

Often you cannot question the whole population. In this case, you need to choose a sample. There are different ways to choose a sample. In any investigation you need to decide on the best way to choose your sample.

Sometimes, your investigation is not about people. For example, you might be investigating the traffic going past your school. If you collect data about some of the vehicles you are still taking a sample. In this case, the population is all the vehicles passing your school.

 
📘 Worked example

A palace has visitors every day. You are doing a survey to find out what visitors think of the visit. You want to talk to a sample of $100$ people. The survey must be done on one day.

a. What factors might affect a person’s opinion of the visit?

b. Describe how you can choose a sample. Take account of the factors you identified in part a. Give any advantages and disadvantages of your method.

Answer:

a. Age and gender are two factors.

b. Choose about five different age bands. Choose people arriving and ask them what age band they are in as one of the questions. Make sure you speak to an equal number of men and women. Select people at several different times during the day.

An advantage is that this includes a range of people. If you spoke to mostly older people or mostly men, for example, the answers would not represent all the visitors.

A disadvantage is that this method will take longer than simply asking the first $100$ people you see. You might need to speak to more than $100$ people to make sure you cover all the different age bands.

Explanation:

In surveys, it is important to consider factors such as age and gender so the sample represents the whole population fairly.

Using age bands and including both men and women ensures diversity in the sample.

The advantage is representativeness, but the disadvantage is that it takes more time and effort to collect balanced data.

 

EXERCISES

1. A manager wants to find customers’ opinions about his shop. The manager wants to choose a sample of $50$ customers.

a. The sample could be the first $50$ customers in the shop after it opens.

i. Write one advantage of this method.

ii. Write one disadvantage of this method.

b. The sample could be $10$ customers chosen at random every $2$ hours until $50$ have been chosen.

i. Write one advantage of this method.

ii. Write one disadvantage of this method.

c. The manager thinks that the opinions of men and women could be different. Explain how he should choose the sample to take account of this.

d. Can you think of another factor that might affect customers’ opinions?

👀 Show answer

a.

i. Advantage: Quick and easy to collect, as no extra selection is needed.

ii. Disadvantage: May not be representative (e.g., only early shoppers).

b.

i. Advantage: Spreads the sample across the day, making it more representative.

ii. Disadvantage: Takes more time and effort to organise and monitor.

c. He should ensure an equal or proportional number of men and women are included in the sample.

d. Other factors could include age, income, or frequency of visiting the shop.

 

🧠 Think like a Mathematician

Task: You are going to find the lengths of the words in a novel. Choose a book to use. You want a sample of $50$ words.

Steps:

a. Describe three different ways of choosing a sample of 50 words.
b. Use one of your methods from part a to sample 50 words from your chosen novel. Use a tally chart to record the number of letters in each word.
c. Did your sampling method give you a sample that was representative of the whole book? Could you improve your method?
d. Try one of your other sampling methods.
e. Compare your first method with your second method. Can you improve the second method? Was one better than the other?
👀 Show Answers

a. Three methods could be:

  • Randomly choose 50 words by opening the book at random pages.
  • Select every 10th word starting from the first page until you have 50 words.
  • Pick 10 words from each of 5 different chapters, ensuring the sample covers the whole novel.

b. Example: Using the chapter sampling method, record each word length in a tally chart.

c. The sample may not be fully representative if only one part of the book was used. Improving the method could involve sampling across more chapters.

d. Trying the systematic method (e.g., every 10th word) might give a different spread of word lengths.

e. Comparing methods: The random method may be less reliable than systematic or stratified (by chapters). The second method can be improved by ensuring words are spread across the book. Stratified sampling across chapters is usually better.

 

EXERCISES

3. Zalika is investigating the number of people in each car on a busy road. She predicts that most cars will contain only one person, the driver. Zalika says, ‘I will start at $08{:}00$ and observe $200$ cars.’

a. Write one advantage and one disadvantage of Zalika’s method.

b. Describe a better way to take a sample of $200$ cars. Explain why your method is better than Zalika’s.

👀 Show answer

a.

  • Advantage: Easy to carry out, requires little organisation.
  • Disadvantage: Cars at $08{:}00$ may not be typical of the whole day (e.g., rush-hour traffic).

b. A better way is to observe cars at different times of the day (morning, afternoon, evening) to get a more representative sample. This is better than Zalika’s method because it avoids bias caused by only collecting data during rush hour.

4. You have been asked to carry out an investigation. You want to find out if learners would like to change the school homework policy. You will choose a sample of about $50$ learners.

Explain how you will choose your sample. Explain why you have chosen this method.

👀 Show answer
A good method is to use stratified sampling — choose learners from different year groups, genders, and abilities to make sure all groups are represented. This ensures the opinions reflect the whole school population and not just one group of learners.
 

EXERCISES

5. Arun and Sofia carry out a survey of parents about school homework. One conjecture is that parents want more homework.
To test this, each asks this question of a sample of $50$ parents:

Is the amount of homework your child gets too little / about right / too much ? (choose one)

This chart shows the results:

a. Do Arun’s results support this conjecture? Give a reason for your answer.

b. Do Sofia’s results support this conjecture? Give a reason for your answer.

c. Give a possible reason why the results of the two surveys are different.

👀 Show answer

a. No. Arun’s highest bar is for “About right,” not “Too little,” so his results do not support the conjecture that parents want more homework.

b. Yes. Sofia’s highest bar is “Too little,” so her results support the conjecture.

c. They sampled different parents (e.g., different classes/schools) or at different times, so the samples were different and gave different results.

Survey method

“I will give a questionnaire to the first $50$ workers visiting the restaurant this lunchtime.”

6. A large factory has a restaurant where employees go for lunch. Arun is investigating ways to improve the restaurant.

a. Give one disadvantage of Arun’s method.

b. Describe a better way of doing the survey.

👀 Show answer

a. It’s a convenience sample (the first $50$ at one lunchtime), so it’s biased and not representative (e.g., only early lunchers/one shift).

b. Use a random or stratified random sample across different times/days and departments/shifts (e.g., randomly select $10$ workers per major department over several lunchtimes) so all employees have an equal chance to be included.

 

EXERCISES

💡 Tip

The mode is the number with the highest frequency.

7. Here is a conjecture about the cars using a particular road:
The modal number of people in a car is $1$.
Marcus, Zara and Sofia each do a survey of cars on the road.
They count the number of people in each car, including the driver.
Each person does the survey for $15$ minutes.
They do their surveys at different times of day.
The results are in the graph on the right.

a. Do the results of each survey support this conjecture?

b. Describe any similarities or differences between the surveys.

c. Why do the samples give different results?

👀 Show answer

a. Marcus and Zara support the conjecture: their modal category is $1$ person per car. Sofia does not support it: her highest frequency is at $2$ people.

b. All three show frequencies generally decreasing as the number of people increases. Marcus counted many more cars overall (higher frequencies). Sofia’s peak is at $2$ people, while Marcus and Zara peak at $1$.

c. They surveyed at different times of day, so traffic composition differed (e.g., school runs/carpooling vs. solo commuting). Different sample sizes also contribute.

8. An examiner has marked $250$ examination papers.
To check the accuracy of her marking, a sample of $10$ papers will be re-marked.

a. Describe three different ways of choosing the sample.

b. Which of the three ways do you think is the best? Explain why you think so.

👀 Show answer

a. Examples: (i) simple random sample of $10$ papers from the $250$; (ii) systematic sample (e.g., every $25^\text{th}$ paper after a random start); (iii) stratified sample by mark bands (e.g., low/medium/high scores) with proportional selection.

b. A simple random sample is usually best for an unbiased accuracy check because every paper has equal chance of selection, avoiding deliberate or accidental choice bias. If mark distribution is very uneven, a stratified random sample by score bands may be better to ensure representation.

 

📏 Investigating Body Measurements

How is height related to other body measurements? This is something you can investigate by collecting data. First you need to ask some statistical questions, for example:

  • Are height and shoe size connected?
  • Are height and hand span connected?
  • Do boys and girls of the same height have the same shoe size?
  • Do people with large hands also have large feet?
  • Are arm length and leg length connected?

When you have some questions, you can make predictions to test, for example:

  1. Taller boys have a larger shoe size than shorter boys.
  2. Girls with large hands also have large feet.
  3. People with long arms also have long legs.

To test your predictions, you need to think about the data you want to collect. For prediction $1$, height could be continuous data if you use a tape measure, or it could be categorical data if you decide to classify people as short, average or tall. Shoe sizes are discrete numerical data. For prediction $3$, you will need to decide how to collect the measurements. To measure the length of an arm or a leg might make people uncomfortable.

You will need a sample. You must think about different ways to choose a sample and the best method to use. It is a good idea to test your data collection method in a small trial. You might want to change your design after you have done this.

 
📘 Worked example

You want to investigate the prediction that, in your school, teenagers with small feet also have small hands and teenagers with large feet also have large hands.

What data could you collect to do this investigation? Decide which data you would collect and give a reason for your answer.

Answer:

To measure the size of feet, I could measure the length of the foot or I could use shoe size. Foot lengths are continuous data and more accurate. However, it might be embarrassing to measure someone’s feet and so it might be more appropriate to use shoe size instead. This is easy to collect.

To measure hand size, I could ask each person to put their hand on a sheet of squared paper, draw round it and work out the area by counting squares. A second way would be to measure the hand span or the length (for example, from the wrist to the end of the middle finger).

Measuring a length will be easier than trying to count squares. I can do a trial where I measure both hand span and length to see which I prefer.

This investigation compares two body measurements — feet and hands. Shoe size or foot length can be used to represent foot size. Hand span or hand length can be used to represent hand size. The choice depends on practicality, accuracy, and comfort of participants.

 

 

 

EXERCISES

Each question in this exercise is about planning a statistical investigation. It is a good idea to work on each question in pairs.

💡 Tip

Example generalisations:
“Older learners are better at estimating.”
or “Girls are better at estimating than boys.”

1. You are going to investigate the ability of learners in your school to estimate. This could be the ability to estimate the length of a line, the size of an angle, the number of items in a jar, a particular length of time, or something else.

a. Write some questions you could ask about estimation.

b. Write some predictions you could test.

c. Describe some different ways of choosing a sample to test one or more of your predictions.

d. Which sample method is best? Give a reason for your answer.

e. Carry out a small trial of your investigation. Can you think of ways to improve your investigation?

f. Use the results of your trial to make a generalisation about learners’ ability to estimate.

👀 Show answer

a. Example: “How many sweets are in the jar?” or “How long is this line/angle?”

b. Predictions: “Older students will estimate more accurately,” “Girls will be better at estimating than boys.”

c. Sampling methods: random sample, stratified sample by year group, convenience sample (e.g., first $30$ students available).

d. Stratified random sampling is best because it represents different groups fairly.

e. A trial can reveal unclear instructions or overly difficult estimation tasks. Improvements might include clearer scales or simpler tasks.

f. Generalisations could be based on observed differences between groups (e.g., “Year 11 students estimate more accurately than Year 7 students”).

 

EXERCISES

2. You are going to investigate the attitudes of learners to the structure of the school day. Here are some things you could think about: the length of lessons; the number of lessons in a day; breaks; start and finish times. You might think of other areas of interest.

💡 Tip

Example generalisation:
“Learners think lessons should be longer.”

a. Write some questions you could ask about the structure of the school day.

b. Write some predictions you can test.

c. Describe some different ways of choosing a sample to test one or more of your predictions.

d. Which sample method is best? Give a reason for your answer.

e. Carry out a small trial of your investigation. Can you think of ways to improve your investigation?

f. Use the results of your trial to make a generalisation about learners’ attitudes to the structure of the school day.

👀 Show answer

a. Example questions: “How long should lessons last?” “Do you prefer fewer longer lessons or more shorter ones?”

b. Predictions: “Learners prefer longer lessons.” “Learners prefer more breaks.”

c. Sampling methods: random sample across year groups, stratified sample by age, convenience sample from a single class.

d. Stratified random sample is best because it represents all year groups and genders fairly.

e. A trial may reveal unclear questions or missing options. Improvements could include simplifying questions or offering multiple-choice answers.

f. Example generalisation: “Most learners prefer longer breaks but shorter lessons.”

3. You are going to investigate news articles. The articles could be in newspapers or online. You could investigate readability, length, vocabulary or other aspects.

a. Write some questions you could ask about news articles.

b. Write some predictions you can test.

c. Describe some different ways of choosing a sample to test one or more of your predictions.

d. Which sample method is best? Give a reason for your answer.

e. Carry out a small trial of your investigation. Can you think of ways to improve your investigation?

f. Use the results of your trial to make a generalisation about news articles.

👀 Show answer

a. Example questions: “How many words are in an article?” “What is the reading age required?” “How many difficult words are used?”

b. Predictions: “Online articles are shorter than newspaper articles.” “Sports articles use simpler vocabulary than political articles.”

c. Sampling methods: random sample of $20$ online and $20$ newspaper articles, stratified by topic, or systematic sample (every $5^\text{th}$ article from a list).

d. Stratified random sample is best because it ensures fair comparison across topics and sources.

e. A trial might show that some articles are too long to analyse. Improvements could be reducing sample size or focusing on particular sections.

f. Example generalisation: “Online articles tend to be shorter and use simpler vocabulary than print articles.”

 

📘 What we've learned

  • A sample is a smaller group selected from a population, used to make conclusions about the whole.
  • Good samples should be representative, avoiding bias by including different groups fairly.
  • We learned different sampling methods:
    • Random sampling — every member of the population has an equal chance of selection.
    • Stratified sampling — the population is divided into groups, and samples are taken proportionally.
    • Systematic sampling — selecting every $n^\text{th}$ member after a random start.
    • Convenience sampling — using easily available members, though this risks bias.
  • Trials or pilot studies help identify problems in the sampling method before carrying out the full investigation.
  • Using larger and more randomised samples generally leads to more reliable results.

Related Past Papers

Related Tutorials

warning Crash report
home
grid_view
add
explore
account_circle