Conducting an investigation
🎯 In this topic you will
- Learn how to collect data to investigate statistical questions.
🧠 Key Words
- continuous data
- categorical data
- data
- discrete data
- prediction
- statistical question
Show Definitions
- continuous data: Data that can take any value within a range, often measured rather than counted (e.g., height, temperature, time).
- categorical data: Data that can be placed into groups or categories based on qualities or characteristics (e.g., eye color, type of fruit).
- data: Information collected from observations, measurements, or surveys that can be analyzed.
- discrete data: Data that can only take certain separate values, often whole numbers (e.g., number of students, goals scored).
- prediction: An estimate of a future outcome based on existing data or patterns.
- statistical question: A question that can be answered by collecting data and analyzing the variation in the responses.
🔍 Examples of Statistical Questions
Look at these examples of statistical questions:
- How many brothers do the learners in your class have?
- What is the average mass of a baby born in your country?
- What sports do learners in your school like to watch?
To answer a statistical question you need to collect data. The number of brothers you have, the mass of a baby, and a sport you watch are all examples of different types of data.
📊 Types of Data
The type of data needed to answer Question 1 is discrete data. The values can be only 0, 1, 2, … Discrete data can take particular values only.
The type of data needed to answer Question 2 is continuous data. Masses, lengths and times are all examples of continuous data. They are measurements. They are numbers that can take any value.
The type of data needed to answer Question 3 is categorical data. The data are words, not numbers.
📥 Collecting Data
There are several ways to collect data. You can:
- use a questionnaire
- carry out measurements
- make observations
- interview people
❓ EXERCISES
1. Choose the correct word to describe the following.
a. The mass of a book
b. The colour of a book
c. The number of pages in a book
👀 Show answer
b. Categorical data
c. Discrete data
2. Here are some facts about a person. Write down the type of data for each fact.
a. Age, in years
b. Shoe size
c. Height
d. Time taken to travel to school
e. Favourite subject
👀 Show answer
b. Discrete data
c. Continuous data
d. Continuous data
e. Categorical data
❓ EXERCISES
3. Liling is comparing different models of cars. She is collecting data about cars. Give some examples of data about cars that are:
a. categorical data
b. discrete data
c. continuous data
👀 Show answer
a. Examples (categorical): body type (sedan/SUV/hatchback), fuel type (petrol/diesel/electric), colour.
b. Examples (discrete): number of doors $2,4,5$, number of seats, cylinders.
c. Examples (continuous): fuel economy in $\text{L}/100\text{ km}$, mass in $\text{kg}$, top speed in $\text{km h}^{-1}$.
4. Here is a question from a questionnaire. The questionnaire is given to people who stayed at a hotel.
a. What is missing from the question?
This table shows some people’s replies to this question.
| Score | $1$ | $2$ | $3$ | $4$ | $5$ |
|---|---|---|---|---|---|
| Frequency | $2$ | $4$ | $9$ | $17$ | $21$ |
b. How many people replied?
c. What was the modal score?
👀 Show answer
a. Missing a labelled scale (e.g., $1=$ “very dirty”, $5=$ “very clean”) and a time frame (e.g., “during your stay / last night”).
b. Total replies $=2+4+9+17+21=53$.
c. Modal score $=5$ (highest frequency $21$).
5. Here is a question from a questionnaire.
How many hours of homework do you do? Tick one box.
Between $1$ and $2$ hours ☐
Between $2$ and $3$ hours ☐
More than $3$ hours ☐
a. Write down two things that are wrong with this question.
b. Write a better question.
👀 Show answer
a. Issues: no time period stated (per day/week?); intervals overlap at $2$ and $3$ hours and omit “less than $1$ hour”.
b. Example improved item: “In a typical week, how many hours of homework do you do? Tick one box.”
Options (non-overlapping):$0\!-\!<1$ ☐, $1\!-\!<2$ ☐, $2\!-\!<3$ ☐, $3\!-\!<4$ ☐, $\ge 4$ ☐.
6. You are investigating what people of your age do in their leisure time.
a. List some activities that you think should be included.
b. Write four questions you would ask in your investigation.
Each question should have several tick boxes to choose from that show the possible answers.
c. Ask your questions to a partner. Use their replies to help you decide whether you can improve your questions.
👀 Show answer
a. Examples: sport, gaming, reading, social media, music practice, volunteering.
b. Sample questions (with tick boxes):
• “How often do you play sport in a typical week? ” ☐ $0$ times ☐ $1\!-\!2$ ☐ $3\!-\!4$ ☐ $\ge 5$
• “About how many hours do you spend gaming per day? ” ☐ $0\!-\!<1$ ☐ $1\!-\!<2$ ☐ $2\!-\!<3$ ☐ $\ge 3$
• “Which activities do you do most weekends? (tick all that apply)” ☐ sport ☐ homework ☐ social media ☐ meet friends ☐ other
• “How do you usually get to leisure activities?” ☐ walk ☐ cycle ☐ bus ☐ car ☐ other.
c. Use feedback to refine wording, ensure options are exhaustive/non-overlapping, and add time frames/units where needed.
❓ EXERCISES
7. Work in pairs for this question. A teacher asks learners to estimate the number of sweets in a jar. She makes two predictions:
• The estimates of the boys will be too big.
• The estimates of the girls will be too small.
a. Explain how the teacher can test her predictions.
i. What type of data will the teacher need to collect?
ii. How can she collect the data?
iii. How can she analyse the data?
👀 Show answer
ii. She can collect the data by asking each boy and each girl to write down their estimate.
iii. She can analyse the data by comparing the mean or median estimates of boys and girls with the actual number, and check if boys’ estimates are consistently higher and girls’ consistently lower.
8. Compare your answers to part a with the answers of another pair in your class. Can your answer be improved?
👀 Show answer
9. Adekunle is investigating the number of emails people receive at work. He makes the prediction: • People get more emails on Mondays than on Fridays. a. How can Adekunle collect data to test his prediction? b. How can he analyse the results?
👀 Show answer
b. He can calculate and compare the mean/median number of emails for Monday and Friday. He might also display the data in bar charts or box plots to show differences and spread.
❓ EXERCISES
10. Sofia and Zara throw two dice and add the scores to get the total. Sofia makes this prediction: “7 is the most likely total.” Zara makes this prediction: “All totals are equally likely.” They throw the two dice 100 times. Their results are shown in the table.

a. Explain why this is not a good way to record the results.
👀 Show answer
b. Show the frequencies for each number in a suitable table.
👀 Show answer
c. Show the results in a bar chart.
👀 Show answer
d. Is Sofia’s prediction correct? Give a reason for your answer.
👀 Show answer
e. Is Zara’s prediction correct? Give a reason for your answer.
👀 Show answer
🧠 Think like a Mathematician
Question: A healthy diet includes fruit and vegetables. Do people your age eat enough fruit and vegetables? You are going to collect data to investigate this question.
Instructions (solo): Work individually for this investigation (rephrased from pair work).
Tasks:
👀 Show Answer
Sample answers:
a. Example predictions to test:
- Fewer than $50\%$ of people my age meet the “$5$-a-day” guideline.
- Students who bring food from home eat more portions of fruit/vegetables than those who usually buy food at school.
- Average portions are higher on weekdays than at weekends.
b. Data collection plan:
- Define a “portion” clearly (e.g., one medium fruit, $80\text{ g}$ veg).
- Use an anonymous one-page questionnaire or a $3$-day food diary to record daily portions (Mon–Fri and weekend).
- Sample a fair group in the same age range; avoid identifying information and obtain consent where needed.
c. Analysis ideas:
- Compute for each person the number of portions per day; find mean, median, and the percentage meeting $\ge 5$ portions.
- Create bar charts comparing groups (home vs. school food; weekday vs. weekend).
- Comment on trends; optionally compare two proportions using a simple difference or confidence idea if taught.
d. Self-review to improve answers:
- Are predictions specific and testable?
- Is the sampling method fair and large enough?
- Are portion definitions, time frame, and anonymity clear?
- Do chosen graphs/statistics answer each prediction directly?