To answer questions in statistics, you must collect data.
First, you decide which data you need to collect. You need to know if it is discrete, continuous or categorical.
Next, you must decide how to collect the data. If you need to question people, you could use a questionnaire that they fill in by themselves. Alternatively, you could interview them and write down the answers.
Sometimes, you need to make observations. For example, you might record times or count vehicles. In this case, you need a sheet to record your observations.
You may not be able to interview or give a questionnaire to everyone. In this case, you need to take a sample. You must think carefully about the best way to choose your sample.
Whenever you collect data, you need to choose a method and explain why you think that method is the best one to use.
1. How would you collect data to answer these questions? State the type of data each time.
a. When a drawing pin is dropped, is it more likely to land point up or point down?
b. How many people visit a particular shop before $09{:}00$?
c. How many brothers and sisters do the members of your class have?
d. How long do learners spend doing homework each night?
e. What is the average number of words in a sentence in a book?
f. How long do learners take to get to school each day?
a. Categorical data (two outcomes: up or down).
b. Discrete numerical data (number of people).
c. Discrete numerical data (count of siblings).
d. Continuous data (time measured in minutes/hours).
e. Discrete numerical data (word counts, averaged).
f. Continuous data (journey time).
2. Ahmad uses a gym. He asks these questions.
He interviews a sample of people at the gym.
a. What data does he need to collect?
b. What type of data is it?
c. Here is his data collection sheet.
What is wrong with this data collection sheet?
| Name | How often do you visit the gym? | Do you prefer to visit in the morning or the evening? |
|---|---|---|
d. Design a better data collection sheet.
e. He asks the first people who come into the gym each morning for a week. Why is this not a good way to choose his sample? Describe a better way.
a. Data on frequency of gym visits, preferred time of day, and gender of gym users.
b. Mixture of categorical (gender, preference) and discrete numerical (frequency) data.
c. The sheet is too vague: “How often?” has no clear categories (daily, weekly, monthly), and gender is missing.
d. A better sheet would include: Name (optional), Gender, How many times per week do you visit? (0–1, 2–3, 4+), Preferred time (morning/afternoon/evening).
e. Asking the first people each morning is a biased sample (morning visitors only). A better way is to take a random sample throughout the day and across several days to represent all gym users.
3. A cinema manager asks these questions.
a. What data is required?
b. Describe two ways to collect the data.
c. The manager decides to give a questionnaire to a sample of customers. She gives it to all the customers on one night. Why is this not a good way to choose the sample?
d. Describe how the manager can get a representative sample.
e. Compare your answer to part d with another group’s answer. Can you improve your answer? Can you improve theirs?
a. Data on frequency of visits, age groups, and film preferences.
b. Two ways: (i) Questionnaire distributed over several days; (ii) Online survey targeting different age groups.
c. One night’s audience may not represent all cinema-goers (e.g., only certain films or age groups that evening).
d. Use stratified random sampling by age group and gender, ensuring representation across film types and times.
e. Improved answers would include checking that the sample size is large enough and covers weekends, weekdays, and different film genres.
4. Xavier has a simple puzzle for children. He asks these questions:
a. What data must Xavier collect? What type of data is it?
b. Xavier gives the puzzle to a sample of children. Design a data collection sheet for Xavier.
a. Data required: time taken to solve the puzzle (continuous), age (discrete/categorical by group), and gender (categorical).
b. Example data sheet:
| Name | Age group | Gender | Time to solve (seconds) |
|---|---|---|---|
5. Sofia surveys cars using a busy road.

She wants to answer these questions:
a. What data does Sofia need to collect?
b. What type of data is this?
c. Design a data collection sheet for Sofia.
a. Sofia needs to collect data on the gender of drivers and the number of people in each car.
b. The gender is categorical data; the number of people in a car is discrete numerical data.
c. Example data collection sheet:
| Car number | Driver gender | Number of people in car |
|---|---|---|
6. Anders is comparing two books, X and Y. He thinks book X is harder to read than book Y.
a. What things make a book hard or easy to read?
b. What statistical questions can you ask to compare how easy it is to read each book?
c. What data can you collect to answer your questions?
d. How would you collect data to answer your questions?
e. Choose a book and use it to test your data collection method. Does it give you the data you need? Can you improve your method?
f. Compare your answers with those of another group. If you have chosen different approaches, which do you prefer?
a. Things that affect readability: vocabulary difficulty, sentence length, font size, layout, and familiarity with the topic.
b. Example statistical questions: “What is the average number of words per sentence?” “How many unfamiliar words are in a typical page?”
c. Data collected could include: average sentence length, number of difficult words, time taken by students to read a passage.
d. Collect data by sampling passages from both books, giving them to different groups of learners, and recording results systematically.
e. A pilot test may show that some questions are unclear or the sample size is too small. Improvements might include clearer definitions of “difficult words” or testing more passages.
f. Comparing approaches with another group may highlight strengths (e.g., wider vocabulary analysis) or weaknesses (e.g., limited sample). You can then decide which method is more effective.