chevron_backward

Data collection

chevron_forward
visibility 44update 6 months agobookmarkshare

🎯 In this topic you will

  • Select a method to collect data to answer a number of linked questions.
  • Consider the different types of data.
  • Consider different sampling methods.
 

📊 Collecting Data in Statistics

To answer questions in statistics, you must collect data.

🔢 Types of Data

First, you decide which data you need to collect. You need to know if it is discrete, continuous or categorical.

📝 Collecting Responses

Next, you must decide how to collect the data. If you need to question people, you could use a questionnaire that they fill in by themselves. Alternatively, you could interview them and write down the answers.

👀 Making Observations

Sometimes, you need to make observations. For example, you might record times or count vehicles. In this case, you need a sheet to record your observations.

📉 Using Samples

You may not be able to interview or give a questionnaire to everyone. In this case, you need to take a sample. You must think carefully about the best way to choose your sample.

✅ Choosing the Best Method

Whenever you collect data, you need to choose a method and explain why you think that method is the best one to use.

 
📘 Worked example

The head teacher wants answers to the following questions:

  • Are learners happy with the length of lessons? Should they be longer or shorter?
  • Are learners happy with the length of their lunch break?

Your task is to investigate these questions using a sample of the learners in the school.

a. What data will you collect?
b. How will you choose your sample?
c. How will you collect the data?
Give reasons for your answers.

Answer:

a. You will need to collect data for: opinions about length of lessons; whether lessons should be longer, shorter or the same; opinions about length of lunch break. Because opinions may differ by gender and age, you should also collect data on learners’ age and gender.

b. If there are equal numbers of boys and girls in the school, you could choose 2 girls and 2 boys from each tutor group. Using the register, select names at random (e.g., in one tutor group you might choose the third and sixth girl and the third and sixth boy). A random sample is more likely to be representative.

c. Individual interviews would be best because they allow you to get an answer from everyone. Alternatively, you could run group discussions so students can share ideas. If time is limited, a questionnaire could be used instead.

This worked example shows how to plan a survey carefully: decide what data to collect, choose a sampling method that is fair, and use a practical method to collect the data. Random sampling increases representativeness, while interviews or questionnaires provide the actual responses.

 

EXERCISES

1. How would you collect data to answer these questions? State the type of data each time.

a. When a drawing pin is dropped, is it more likely to land point up or point down?

b. How many people visit a particular shop before $09{:}00$?

c. How many brothers and sisters do the members of your class have?

d. How long do learners spend doing homework each night?

e. What is the average number of words in a sentence in a book?

f. How long do learners take to get to school each day?

👀 Show answer

a. Categorical data (two outcomes: up or down).

b. Discrete numerical data (number of people).

c. Discrete numerical data (count of siblings).

d. Continuous data (time measured in minutes/hours).

e. Discrete numerical data (word counts, averaged).

f. Continuous data (journey time).

2. Ahmad uses a gym. He asks these questions.

  • Do people visit this gym every week?
  • Do people come at a particular time of day?
  • Is there a difference between the habits of men and women?

He interviews a sample of people at the gym.

a. What data does he need to collect?

b. What type of data is it?

c. Here is his data collection sheet.
What is wrong with this data collection sheet?

Name How often do you visit the gym? Do you prefer to visit in the morning or the evening?
     
     

d. Design a better data collection sheet.

e. He asks the first people who come into the gym each morning for a week. Why is this not a good way to choose his sample? Describe a better way.

👀 Show answer

a. Data on frequency of gym visits, preferred time of day, and gender of gym users.

b. Mixture of categorical (gender, preference) and discrete numerical (frequency) data.

c. The sheet is too vague: “How often?” has no clear categories (daily, weekly, monthly), and gender is missing.

d. A better sheet would include: Name (optional), Gender, How many times per week do you visit? (0–1, 2–3, 4+), Preferred time (morning/afternoon/evening).

e. Asking the first people each morning is a biased sample (morning visitors only). A better way is to take a random sample throughout the day and across several days to represent all gym users.

 

EXERCISES

3. A cinema manager asks these questions.

  • How often do people visit the cinema?
  • Do younger people visit more often than older people?
  • What type of film do people like?

a. What data is required?

b. Describe two ways to collect the data.

c. The manager decides to give a questionnaire to a sample of customers. She gives it to all the customers on one night. Why is this not a good way to choose the sample?

d. Describe how the manager can get a representative sample.

e. Compare your answer to part d with another group’s answer. Can you improve your answer? Can you improve theirs?

👀 Show answer

a. Data on frequency of visits, age groups, and film preferences.

b. Two ways: (i) Questionnaire distributed over several days; (ii) Online survey targeting different age groups.

c. One night’s audience may not represent all cinema-goers (e.g., only certain films or age groups that evening).

d. Use stratified random sampling by age group and gender, ensuring representation across film types and times.

e. Improved answers would include checking that the sample size is large enough and covers weekends, weekdays, and different film genres.

4. Xavier has a simple puzzle for children. He asks these questions:

  • How long does it take to solve the puzzle?
  • Can girls solve it more quickly than boys?
  • Can older children solve it more quickly than younger ones?

a. What data must Xavier collect? What type of data is it?

b. Xavier gives the puzzle to a sample of children. Design a data collection sheet for Xavier.

👀 Show answer

a. Data required: time taken to solve the puzzle (continuous), age (discrete/categorical by group), and gender (categorical).

b. Example data sheet:

Name Age group Gender Time to solve (seconds)
       
       
 

EXERCISES

5. Sofia surveys cars using a busy road.

She wants to answer these questions:

  • What percentage of drivers are male?
  • What percentage of cars carry only one person?
  • What is the average number of people in a car?

a. What data does Sofia need to collect?

b. What type of data is this?

c. Design a data collection sheet for Sofia.

👀 Show answer

a. Sofia needs to collect data on the gender of drivers and the number of people in each car.

b. The gender is categorical data; the number of people in a car is discrete numerical data.

c. Example data collection sheet:

Car number Driver gender Number of people in car
     
     

6. Anders is comparing two books, X and Y. He thinks book X is harder to read than book Y.

a. What things make a book hard or easy to read?

b. What statistical questions can you ask to compare how easy it is to read each book?

c. What data can you collect to answer your questions?

d. How would you collect data to answer your questions?

e. Choose a book and use it to test your data collection method. Does it give you the data you need? Can you improve your method?

f. Compare your answers with those of another group. If you have chosen different approaches, which do you prefer?

👀 Show answer

a. Things that affect readability: vocabulary difficulty, sentence length, font size, layout, and familiarity with the topic.

b. Example statistical questions: “What is the average number of words per sentence?” “How many unfamiliar words are in a typical page?”

c. Data collected could include: average sentence length, number of difficult words, time taken by students to read a passage.

d. Collect data by sampling passages from both books, giving them to different groups of learners, and recording results systematically.

e. A pilot test may show that some questions are unclear or the sample size is too small. Improvements might include clearer definitions of “difficult words” or testing more passages.

f. Comparing approaches with another group may highlight strengths (e.g., wider vocabulary analysis) or weaknesses (e.g., limited sample). You can then decide which method is more effective.

 

📘 What we've learned

  • Data can be classified as discrete, continuous, or categorical, depending on its nature.
  • Different methods of data collection include questionnaires, interviews, and observation sheets.
  • When it is not possible to collect data from an entire population, we take a sample instead.
  • A sample should be carefully chosen to be representative and avoid bias.
  • Whenever data is collected, the method must be explained and justified as the best approach for the investigation.

Related Past Papers

Related Tutorials

warning Crash report
home
grid_view
add
explore
account_circle