chevron_backward

Conducting an investigation

chevron_forward
visibility 62update 6 months agobookmarkshare

🎯 In this topic you will

  • Learn how to collect data to investigate statistical questions.
 

🧠 Key Words

  • continuous data
  • categorical data
  • data
  • discrete data
  • prediction
  • statistical question
Show Definitions
  • continuous data: Data that can take any value within a range, often measured rather than counted (e.g., height, temperature, time).
  • categorical data: Data that can be placed into groups or categories based on qualities or characteristics (e.g., eye color, type of fruit).
  • data: Information collected from observations, measurements, or surveys that can be analyzed.
  • discrete data: Data that can only take certain separate values, often whole numbers (e.g., number of students, goals scored).
  • prediction: An estimate of a future outcome based on existing data or patterns.
  • statistical question: A question that can be answered by collecting data and analyzing the variation in the responses.
 

🔍 Examples of Statistical Questions

Look at these examples of statistical questions:

  1. How many brothers do the learners in your class have?
  2. What is the average mass of a baby born in your country?
  3. What sports do learners in your school like to watch?

To answer a statistical question you need to collect data. The number of brothers you have, the mass of a baby, and a sport you watch are all examples of different types of data.

📊 Types of Data

The type of data needed to answer Question 1 is discrete data. The values can be only 0, 1, 2, … Discrete data can take particular values only.

The type of data needed to answer Question 2 is continuous data. Masses, lengths and times are all examples of continuous data. They are measurements. They are numbers that can take any value.

The type of data needed to answer Question 3 is categorical data. The data are words, not numbers.

📥 Collecting Data

There are several ways to collect data. You can:

  • use a questionnaire
  • carry out measurements
  • make observations
  • interview people
 
📘 Worked example

Explain what method you would use to collect data to test each of these predictions. For each case, describe what type of data it is.

a. 11-year-old girls can run a distance of 50 metres faster than 11-year-old boys.

b. Most teachers in my school wear glasses.

c. Plants in the sun grow taller than plants in the shade.

Answer:

a. Choose some girls and boys to run 50 metres and time each one. This is continuous data.

b. Observe each teacher. You can also interview the teachers to ask if they wear glasses for some activities, such as driving or reading. This is categorical data.

c. You could do an experiment. Plant some seeds in the sun and plant some of the same type of seeds in the shade. When they grow, measure the height of each plant. This is continuous data.

a. Collecting times gives numerical values that can vary continuously, so this is continuous data.

b. Wearing glasses is a yes/no category, so this is categorical data.

c. Measuring plant height produces numerical values that vary continuously, so this is continuous data.

 

EXERCISES

1. Choose the correct word to describe the following.

a. The mass of a book

b. The colour of a book

c. The number of pages in a book

👀 Show answer
a. Continuous data
b. Categorical data
c. Discrete data

2. Here are some facts about a person. Write down the type of data for each fact.

a. Age, in years

b. Shoe size

c. Height

d. Time taken to travel to school

e. Favourite subject

👀 Show answer
a. Discrete data
b. Discrete data
c. Continuous data
d. Continuous data
e. Categorical data
 

EXERCISES

3. Liling is comparing different models of cars. She is collecting data about cars. Give some examples of data about cars that are:

a. categorical data

b. discrete data

c. continuous data

👀 Show answer

a. Examples (categorical): body type (sedan/SUV/hatchback), fuel type (petrol/diesel/electric), colour.

b. Examples (discrete): number of doors $2,4,5$, number of seats, cylinders.

c. Examples (continuous): fuel economy in $\text{L}/100\text{ km}$, mass in $\text{kg}$, top speed in $\text{km h}^{-1}$.

4. Here is a question from a questionnaire. The questionnaire is given to people who stayed at a hotel.

How clean was your room? Circle one number. 1 2 3 4 5

a. What is missing from the question?
This table shows some people’s replies to this question.

Score $1$ $2$ $3$ $4$ $5$
Frequency $2$ $4$ $9$ $17$ $21$

b. How many people replied?

c. What was the modal score?

👀 Show answer

a. Missing a labelled scale (e.g., $1=$ “very dirty”, $5=$ “very clean”) and a time frame (e.g., “during your stay / last night”).

b. Total replies $=2+4+9+17+21=53$.

c. Modal score $=5$ (highest frequency $21$).

5. Here is a question from a questionnaire.
How many hours of homework do you do? Tick one box.

Between $1$ and $2$ hours ☐
Between $2$ and $3$ hours ☐
More than $3$ hours ☐

a. Write down two things that are wrong with this question.

b. Write a better question.

👀 Show answer

a. Issues: no time period stated (per day/week?); intervals overlap at $2$ and $3$ hours and omit “less than $1$ hour”.

b. Example improved item: “In a typical week, how many hours of homework do you do? Tick one box.”
Options (non-overlapping):$0\!-\!<1$ ☐, $1\!-\!<2$ ☐, $2\!-\!<3$ ☐, $3\!-\!<4$ ☐, $\ge 4$ ☐.

6. You are investigating what people of your age do in their leisure time.

a. List some activities that you think should be included.

b. Write four questions you would ask in your investigation.
Each question should have several tick boxes to choose from that show the possible answers.

c. Ask your questions to a partner. Use their replies to help you decide whether you can improve your questions.

👀 Show answer

a. Examples: sport, gaming, reading, social media, music practice, volunteering.

b. Sample questions (with tick boxes):
• “How often do you play sport in a typical week? ” ☐ $0$ times ☐ $1\!-\!2$$3\!-\!4$$\ge 5$
• “About how many hours do you spend gaming per day? ” ☐ $0\!-\!<1$$1\!-\!<2$$2\!-\!<3$$\ge 3$
• “Which activities do you do most weekends? (tick all that apply)” ☐ sport ☐ homework ☐ social media ☐ meet friends ☐ other
• “How do you usually get to leisure activities?” ☐ walk ☐ cycle ☐ bus ☐ car ☐ other.

c. Use feedback to refine wording, ensure options are exhaustive/non-overlapping, and add time frames/units where needed.

 

EXERCISES

7. Work in pairs for this question. A teacher asks learners to estimate the number of sweets in a jar. She makes two predictions:

• The estimates of the boys will be too big.

• The estimates of the girls will be too small.

a. Explain how the teacher can test her predictions.

i. What type of data will the teacher need to collect?

ii. How can she collect the data?

iii. How can she analyse the data?

👀 Show answer
i. The teacher needs to collect numerical estimate data from both boys and girls, along with the actual number of sweets.

ii. She can collect the data by asking each boy and each girl to write down their estimate.

iii. She can analyse the data by comparing the mean or median estimates of boys and girls with the actual number, and check if boys’ estimates are consistently higher and girls’ consistently lower.

8. Compare your answers to part a with the answers of another pair in your class. Can your answer be improved?

👀 Show answer
Answers will vary. An improved answer should give more detail about data collection (ensuring fairness, enough samples) and analysis (using graphs, averages, or spread).

9. Adekunle is investigating the number of emails people receive at work. He makes the prediction: • People get more emails on Mondays than on Fridays. a. How can Adekunle collect data to test his prediction? b. How can he analyse the results?

👀 Show answer
a. Adekunle can ask a sample of workers to record the number of emails they receive on Monday and Friday, or he could access company email logs (with permission).

b. He can calculate and compare the mean/median number of emails for Monday and Friday. He might also display the data in bar charts or box plots to show differences and spread.
 

EXERCISES

10. Sofia and Zara throw two dice and add the scores to get the total. Sofia makes this prediction: “7 is the most likely total.” Zara makes this prediction: “All totals are equally likely.” They throw the two dice 100 times. Their results are shown in the table.

a. Explain why this is not a good way to record the results.

👀 Show answer
Writing every total in a grid makes it hard to see frequencies or patterns. It is difficult to count and compare results quickly.

b. Show the frequencies for each number in a suitable table.

👀 Show answer
A frequency table should be drawn with totals from $2$ to $12$ in one column and their counts (from the data) in the other column. For example: $2: 3$, $3: 6$, $4: 9$, $5: 12$, $6: 17$, $7: 21$, $8: 15$, $9: 10$, $10: 5$, $11: 2$, $12: 0$. (Values depend on counting from the grid.)

c. Show the results in a bar chart.

👀 Show answer
A bar chart should be drawn with totals $2$–$12$ on the horizontal axis and frequencies on the vertical axis. The bar for $7$ should be the tallest, showing it occurs most often.

d. Is Sofia’s prediction correct? Give a reason for your answer.

👀 Show answer
Yes. The bar chart and frequency table show that total $7$ occurs most often, so Sofia’s prediction is correct.

e. Is Zara’s prediction correct? Give a reason for your answer.

👀 Show answer
No. The results show that some totals occur much more often than others. For example, $7$ appears far more often than $2$ or $12$. This means the totals are not equally likely.
 

🧠 Think like a Mathematician

Question: A healthy diet includes fruit and vegetables. Do people your age eat enough fruit and vegetables? You are going to collect data to investigate this question.

Instructions (solo): Work individually for this investigation (rephrased from pair work).

Tasks:

a. Write down three predictions to test.
b. Explain how you can collect data to test your predictions.
c. Describe how you can analyse your data.
d. Improve your answers to parts a, b and c by reviewing them critically (originally: compare with another pair).
👀 Show Answer

Sample answers:

a. Example predictions to test:

  • Fewer than $50\%$ of people my age meet the “$5$-a-day” guideline.
  • Students who bring food from home eat more portions of fruit/vegetables than those who usually buy food at school.
  • Average portions are higher on weekdays than at weekends.

b. Data collection plan:

  • Define a “portion” clearly (e.g., one medium fruit, $80\text{ g}$ veg).
  • Use an anonymous one-page questionnaire or a $3$-day food diary to record daily portions (Mon–Fri and weekend).
  • Sample a fair group in the same age range; avoid identifying information and obtain consent where needed.

c. Analysis ideas:

  • Compute for each person the number of portions per day; find mean, median, and the percentage meeting $\ge 5$ portions.
  • Create bar charts comparing groups (home vs. school food; weekday vs. weekend).
  • Comment on trends; optionally compare two proportions using a simple difference or confidence idea if taught.

d. Self-review to improve answers:

  • Are predictions specific and testable?
  • Is the sampling method fair and large enough?
  • Are portion definitions, time frame, and anonymity clear?
  • Do chosen graphs/statistics answer each prediction directly?
 

📘 What we've learned

  • We learned how to turn a real-world question into testable predictions that can be investigated.
  • We practiced designing fair methods for data collection, making sure results are reliable and unbiased.
  • We explored different ways of recording results, including frequency tables, bar charts, and summary statistics.
  • We applied reasoning to check whether predictions were supported by evidence.
  • We understood that improving an investigation often means reviewing predictions, refining methods, or repeating with better data.

Related Past Papers

Related Tutorials

warning Crash report
home
grid_view
add
explore
account_circle