chevron_backward

Bias

chevron_forward
visibility 60update 6 months agobookmarkshare

🎯 In this topic you will

  • Learn about sources of bias in data collection.
  • Learn about ways to choose an unbiased sample.
  • Learn how to identify wrong or misleading information.
 

🧠 Key Words

  • bias
  • misleading
Show Definitions
  • bias: A systematic error in the way data is collected or interpreted, which makes the results unrepresentative or unfair.
  • misleading: Information or data that gives the wrong impression or hides the truth, leading to incorrect conclusions.
 

📊 Reliability of Results

The reliability of the results of a statistical investigation depend on the quality of the data collected. Data from a sample that is not representative of the whole population might not give a valid outcome.

⚖️ Biased Samples

A sample that does not represent the whole population is biased. There are different possible sources of bias.

 
📘 Worked example

An investigation is carried out to test the prediction that people in a town are in favour of building a new library.

A survey is carried out on people using a supermarket between 09:00 and 12:00 one Wednesday and Thursday.

a. Explain why this will give a biased sample.

b. Suggest a way to improve the investigation.

Answer:

a. A survey at that time and in that place will include few people who are at work during the day. The sample will be biased if people who work during the day are underrepresented. It could include people who do not use the present library and who will not have an opinion.

b. It would be better to do the survey in different places and at different times. The survey could be carried out at the present library. This will give a variety of people who actually use the library, especially if you speak to people at different times of day and on different days.

This example shows how surveys can become biased if the sample does not represent the entire population. To reduce bias, surveys should be carried out at different times and places, and should include people who are directly affected by the issue.

 
📘 Worked example

A company employs 187 men and 362 women.
You want to choose a representative sample of 40 men and women.

a. How many men and women should you choose?

b. List three other factors to consider when choosing a representative sample.

Answer:

a. The company has $187 + 362 = 549$ employees.
The percentage who are men = $\dfrac{187}{549} \times 100\% = 34.1\%$
34.1% of the sample should be men.
$34.1\%$ of $40 = 0.341 \times 40 = 14$ to the nearest whole number.
The sample should have 14 men and 26 women.

b. Other possible factors are, for example, age, job and salary.

To make a representative sample, calculate the proportion of men and women in the population, then apply this proportion to the sample size. Additional factors such as age, job role, and salary should also be considered to ensure the sample reflects the wider workforce.

 

🧠 PROBLEM-SOLVING Strategy

Avoiding Bias in Data Collection

Use these steps to design surveys and collect data that fairly represent the population.

  1. Define your target population: Decide clearly who you want to study (e.g., all students in a college, all customers of a supermarket).
  2. Choose a representative sample: Match the proportions of key groups (e.g., $200$ girls and $150$ boys means a sample of $30$ should include $\tfrac{200}{350}\times30=17$ girls and $13$ boys).
  3. Use fair timing and location: Collect data at different times and places to avoid overrepresenting certain groups (e.g., not only morning gym users or weekday shoppers).
  4. Avoid leading or biased questions: Reword questions so they do not push people toward one answer (e.g., instead of “Do you agree…?”, ask neutrally “What is your opinion…?”).
  5. Check adverts and surveys: Look for missing information, small sample sizes, or misleading percentages (e.g., “85% of 142 customers” may not represent all customers).
  6. Design clear data sheets: Ensure response options cover all possibilities and do not force unfair choices (e.g., include “other” or “no opinion”).
  7. Interpret results carefully: Ask if the sample size is large enough and whether the method could have excluded important groups.
Population Total Proportion Sample of $40$
Men $187$ $34.1\%$ $14$
Women $362$ $65.9\%$ $26$
 

EXERCISES

1. In a college there are $200$ girls and $150$ boys. You want to choose a representative sample of $30$ students. How many girls and boys should you choose?

👀 Show answer
Total students = $200 + 150 = 350$. Proportion of girls = $\dfrac{200}{350} \times 30 = 17.14 \approx 17$. Proportion of boys = $\dfrac{150}{350} \times 30 = 12.86 \approx 13$. So the representative sample should include $17$ girls and $13$ boys.

2. Look at this advert.

a. What is the purpose of the advert?

b. List two possible sources of bias.

👀 Show answer

a. The purpose is to persuade customers to buy the shampoo by showing a high satisfaction rate.

b. Possible sources of bias: – The sample size may not be representative of all users. – Only satisfied customers may have responded. – The survey might have been conducted or sponsored by the company.

3. You are doing a statistical investigation. You need to find the opinions of a large sample of people.

a. Give two advantages of using social media.

b. Give two disadvantages of using social media.

👀 Show answer

a. Advantages: – Easy to reach a very large number of people quickly. – Low cost compared to face-to-face or phone surveys.

b. Disadvantages: – The sample may not be representative (not everyone uses social media). – Responses may be unreliable or influenced by peer pressure.

 

EXERCISES

4. This table shows the numbers of students in a college.

Age Male Female Total
$16$ $50$ $75$ $125$
$17$ $42$ $92$ $134$
Total $92$ $167$ $259$

 

 

You want a representative sample of $40$ students.

a. How many students in your sample should be $16$-year-old males?

b. How many students in your sample should be females?

The graph shows the data in the table.

c. Explain why the graph is misleading.

d. Draw an improved version of the graph.

👀 Show answer

a. Total students = $259$. Proportion of $16$-year-old males = $\tfrac{50}{259} \times 40 \approx 7.7 \approx 8$ students.

b. Proportion of females = $\tfrac{167}{259} \times 40 \approx 25.8 \approx 26$ students.

c. The graph is misleading because the scales of the bars are inconsistent and the categories are not presented clearly for comparison.

d. An improved graph would use the same scale for all categories and separate bars for male and female students, making totals easy to compare.

5. A statistician wants to investigate people’s attitudes towards a plan for a new housing development. The statistician gives out $350$ questionnaires and receives $105$ replies.

a. Work out the percentage of replies.

b. How might the low percentage of replies cause bias?

👀 Show answer

a. $\tfrac{105}{350} \times 100\% = 30\%$ replies.

b. The views of only $30\%$ of people may not represent the opinions of the whole population. Those who replied may have stronger opinions, introducing bias.

6. A sample of people were given two versions of a drink, the original recipe and a new recipe. They were asked, ‘Do you prefer the new recipe?’ $85\%$ said, ‘yes’.

a. Why might this result be biased?

b. How could you arrange the tasting and questioning to avoid bias?

👀 Show answer

a. The result may be biased if the sample is too small, not representative, or if the question is leading.

b. Use a larger, more representative sample and ensure the tasting is blind (participants don’t know which version they are drinking). Ask a neutral question.

7. Here are questions from surveys that will give biased results. For each question

i. explain why it will give a biased result

ii. rewrite the question in a better way.

a. Do you agree that global warming is caused by humans?

b. Do you think entry to this exhibition should be free?

c. Are you overweight?

d. Do you think you take enough exercise?

👀 Show answer

a. Biased: assumes agreement. Better: “What are your views on the causes of global warming?”

b. Biased: suggests a desirable answer. Better: “What do you think about the entry fee for this exhibition?”

c. Biased: too personal and judgemental. Better: “What is your weight range?”

d. Biased: assumes insufficient exercise. Better: “How often do you exercise each week?”

8. Customers who have stayed at a hotel are asked to complete an online survey. The hotel wants to know if the customers felt they received good service and value for money. How could the results from this survey be biased?

👀 Show answer
Only customers who stayed at the hotel and chose to respond are included. This excludes non-responders and those who didn’t stay, so the sample may not reflect the opinions of all potential customers.
 

EXERCISES

9. You are planning to do a survey of customers in a supermarket or shopping mall. You will do the survey on a Sunday. You will ask a sample of customers a small number of questions. You want equal numbers of men and women. You want $25\%$ of your sample to be under $30$ and the rest to be aged $30$ or over. You want to ask $120$ people altogether.

Describe how you could carry out this survey. In particular, describe how you will choose your sample and when you will do your survey.

👀 Show answer
To get a representative sample of $120$ people: – Select $60$ men and $60$ women. – Ensure $25\%$ are under $30$: $0.25 \times 120 = 30$ people. – Ensure $75\%$ are $30$ or over: $90$ people. Approach people systematically at different times of the day to avoid bias, for example every fifth customer leaving the store. This ensures the age and gender quotas are met while keeping randomness in the selection.

10. Marcus wants to know if more men or women use a gym on Monday evening and on Friday evening. He looks at the first $30$ visitors on a Monday evening and on a Friday evening. He records the results in a table and draws a diagram to illustrate the data as shown.

Day Men Women Total
Monday $18$ $12$ $30$
Friday $13$ $17$ $30$

Marcus says: More men than women use the gym on a Monday evening. More women than men use the gym on a Friday evening.

a. Are Marcus’ conclusions valid? Give a reason for your answer.

b. Explain why Marcus’ diagram is misleading.

c. Draw an improved version of the diagram.

👀 Show answer

a. Marcus’ conclusions are valid for his sample: Monday shows more men ($18$ vs $12$ women), Friday shows more women ($17$ vs $13$ men). But the sample is small and may not represent all gym users.

b. The diagram is misleading because the scales are inconsistent and the bar positions exaggerate differences. Bars should start at zero with equal widths and spacing.

c. An improved diagram would use a grouped bar chart with consistent scales, starting from zero, showing men and women side by side for each day.

 

📘 What we've learned

  • A biased sample does not fairly represent the whole population and can lead to unreliable conclusions.
  • Bias can come from how the sample is chosen (e.g., only at certain times, places, or groups of people).
  • Surveys and adverts can also be misleading if questions are worded unfairly or results are presented with distorted graphs or percentages.
  • To reduce bias, use a representative sample, gather data at different times and places, and ask neutral, clear questions.
  • Always check whether data or a graph might be misleading before drawing conclusions.

Related Past Papers

Related Tutorials

warning Crash report
home
grid_view
add
explore
account_circle