The reliability of the results of a statistical investigation depend on the quality of the data collected. Data from a sample that is not representative of the whole population might not give a valid outcome.
A sample that does not represent the whole population is biased. There are different possible sources of bias.
1. In a college there are $200$ girls and $150$ boys. You want to choose a representative sample of $30$ students. How many girls and boys should you choose?
2. Look at this advert.
![]()
a. What is the purpose of the advert?
b. List two possible sources of bias.
a. The purpose is to persuade customers to buy the shampoo by showing a high satisfaction rate.
b. Possible sources of bias: – The sample size may not be representative of all users. – Only satisfied customers may have responded. – The survey might have been conducted or sponsored by the company.
3. You are doing a statistical investigation. You need to find the opinions of a large sample of people.
a. Give two advantages of using social media.
b. Give two disadvantages of using social media.
a. Advantages: – Easy to reach a very large number of people quickly. – Low cost compared to face-to-face or phone surveys.
b. Disadvantages: – The sample may not be representative (not everyone uses social media). – Responses may be unreliable or influenced by peer pressure.
4. This table shows the numbers of students in a college.
| Age | Male | Female | Total |
|---|---|---|---|
| $16$ | $50$ | $75$ | $125$ |
| $17$ | $42$ | $92$ | $134$ |
| Total | $92$ | $167$ | $259$ |

You want a representative sample of $40$ students.
a. How many students in your sample should be $16$-year-old males?
b. How many students in your sample should be females?
The graph shows the data in the table.
c. Explain why the graph is misleading.
d. Draw an improved version of the graph.
a. Total students = $259$. Proportion of $16$-year-old males = $\tfrac{50}{259} \times 40 \approx 7.7 \approx 8$ students.
b. Proportion of females = $\tfrac{167}{259} \times 40 \approx 25.8 \approx 26$ students.
c. The graph is misleading because the scales of the bars are inconsistent and the categories are not presented clearly for comparison.
d. An improved graph would use the same scale for all categories and separate bars for male and female students, making totals easy to compare.
5. A statistician wants to investigate people’s attitudes towards a plan for a new housing development. The statistician gives out $350$ questionnaires and receives $105$ replies.
a. Work out the percentage of replies.
b. How might the low percentage of replies cause bias?
a. $\tfrac{105}{350} \times 100\% = 30\%$ replies.
b. The views of only $30\%$ of people may not represent the opinions of the whole population. Those who replied may have stronger opinions, introducing bias.
6. A sample of people were given two versions of a drink, the original recipe and a new recipe. They were asked, ‘Do you prefer the new recipe?’ $85\%$ said, ‘yes’.
a. Why might this result be biased?
b. How could you arrange the tasting and questioning to avoid bias?
a. The result may be biased if the sample is too small, not representative, or if the question is leading.
b. Use a larger, more representative sample and ensure the tasting is blind (participants don’t know which version they are drinking). Ask a neutral question.
7. Here are questions from surveys that will give biased results. For each question
i. explain why it will give a biased result
ii. rewrite the question in a better way.
a. Do you agree that global warming is caused by humans?
b. Do you think entry to this exhibition should be free?
c. Are you overweight?
d. Do you think you take enough exercise?
a. Biased: assumes agreement. Better: “What are your views on the causes of global warming?”
b. Biased: suggests a desirable answer. Better: “What do you think about the entry fee for this exhibition?”
c. Biased: too personal and judgemental. Better: “What is your weight range?”
d. Biased: assumes insufficient exercise. Better: “How often do you exercise each week?”
8. Customers who have stayed at a hotel are asked to complete an online survey. The hotel wants to know if the customers felt they received good service and value for money. How could the results from this survey be biased?
9. You are planning to do a survey of customers in a supermarket or shopping mall. You will do the survey on a Sunday. You will ask a sample of customers a small number of questions. You want equal numbers of men and women. You want $25\%$ of your sample to be under $30$ and the rest to be aged $30$ or over. You want to ask $120$ people altogether.
Describe how you could carry out this survey. In particular, describe how you will choose your sample and when you will do your survey.
10. Marcus wants to know if more men or women use a gym on Monday evening and on Friday evening. He looks at the first $30$ visitors on a Monday evening and on a Friday evening. He records the results in a table and draws a diagram to illustrate the data as shown.
| Day | Men | Women | Total |
|---|---|---|---|
| Monday | $18$ | $12$ | $30$ |
| Friday | $13$ | $17$ | $30$ |

Marcus says: More men than women use the gym on a Monday evening. More women than men use the gym on a Friday evening.
a. Are Marcus’ conclusions valid? Give a reason for your answer.
b. Explain why Marcus’ diagram is misleading.
c. Draw an improved version of the diagram.
a. Marcus’ conclusions are valid for his sample: Monday shows more men ($18$ vs $12$ women), Friday shows more women ($17$ vs $13$ men). But the sample is small and may not represent all gym users.
b. The diagram is misleading because the scales are inconsistent and the bar positions exaggerate differences. Bars should start at zero with equal widths and spacing.
c. An improved diagram would use a grouped bar chart with consistent scales, starting from zero, showing men and women side by side for each day.