menuGamaTrain
search

chevron_left Interquartile range: The difference between the upper and lower quartiles chevron_right

Interquartile range: The difference between the upper and lower quartiles
Anna Kowalski
share
visibility21
calendar_month2025-12-07

Interquartile Range: The Measure of Middle-Spread

Understanding how data is spread around the median and why it's more resistant to extreme values.
The Interquartile Range (IQR) is a fundamental statistic in descriptive statistics that measures the statistical dispersion of the middle 50% of a dataset. Unlike the range, which is sensitive to extreme values (outliers), the IQR provides a robust view of where the bulk of the values lie. It is calculated as the difference between the upper quartile (Q3, the 75th percentile) and the lower quartile (Q1, the 25th percentile). Key concepts include quartiles, five-number summary, box plots, and outlier detection. Mastering the IQR equips students with a powerful tool for analyzing and comparing real-world data distributions, from test scores to scientific measurements.

Understanding Quartiles: The Foundation of IQR

To grasp the Interquartile Range, we must first understand quartiles. Imagine you have sorted a list of numbers from smallest to largest. Quartiles are three values that divide this sorted list into four equal parts, each containing 25% of the data.

  • First Quartile (Q1): This is the median of the lower half of the data (not including the overall median if the number of data points is odd). 25% of the data values are less than or equal to Q1. It is also called the lower quartile.
  • Second Quartile (Q2): This is the median of the entire dataset. 50% of the data values are less than or equal to Q2.
  • Third Quartile (Q3): This is the median of the upper half of the data. 75% of the data values are less than or equal to Q3. It is also called the upper quartile.

The Interquartile Range (IQR) is simply the range[1] between Q1 and Q3. The formula is:

IQR Formula: $ IQR = Q_3 - Q_1 $

For example, consider the sorted dataset of 9 student test scores: 62, 67, 71, 74, 77, 82, 85, 89, 93.

  • The median (Q2) is the 5th number: 77.
  • Lower half (excluding median): 62, 67, 71, 74. Its median (Q1): (67+71)/2 = 69.
  • Upper half (excluding median): 82, 85, 89, 93. Its median (Q3): (85+89)/2 = 87.
  • IQR = Q3 - Q1 = 87 - 69 = 18.

This IQR of 18 points tells us that the middle 50% of students scored within an 18-point range.

Calculating IQR: A Step-by-Step Guide for Different Datasets

Let's break down the calculation process for datasets with an even and odd number of values.

Step-by-Step Guide:
1. Order the data from least to greatest.
2. Find the Median (Q2) of the whole dataset.
3. Find Q1: The median of the data points below Q2.
4. Find Q3: The median of the data points above Q2.
5. Calculate IQR: Subtract Q1 from Q3.

Example 1: Odd Number of Data Points (n=11)
Data: 3, 7, 8, 12, 13, 14, 18, 21, 23, 26, 30

  • Median (Q2): The 6th value = 14.
  • Lower half (values below 14): 3, 7, 8, 12, 13. Q1 = median of this half = 8.
  • Upper half (values above 14): 18, 21, 23, 26, 30. Q3 = median of this half = 23.
  • IQR = $ 23 - 8 = 15 $.

Example 2: Even Number of Data Points (n=8)
Data: 5, 9, 10, 11, 13, 15, 17, 21

  • Median (Q2): Average of the 4th and 5th values = $ (11 + 13) / 2 = 12 $.
  • Lower half (first 4 numbers): 5, 9, 10, 11. Q1 = median of this half = $ (9 + 10) / 2 = 9.5 $.
  • Upper half (last 4 numbers): 13, 15, 17, 21. Q3 = median of this half = $ (15 + 17) / 2 = 16 $.
  • IQR = $ 16 - 9.5 = 6.5 $.

The Five-Number Summary and Box Plots

The IQR is a key part of the Five-Number Summary, which gives a quick overview of a dataset's distribution. The five numbers are:

  1. Minimum (Min)
  2. First Quartile (Q1)
  3. Median (Q2)
  4. Third Quartile (Q3)
  5. Maximum (Max)

This summary is visually represented by a box plot (or box-and-whisker plot). The box plot is a fantastic tool for comparing distributions across different groups.

StatisticValue from Example 1Description
Minimum3The smallest data point.
Q1825th percentile, start of the IQR box.
Median (Q2)14The middle value, line inside the box.
Q32375th percentile, end of the IQR box.
Maximum30The largest data point.
IQR15Q3 - Q1. The length of the box in the plot.

In the box plot, the "box" stretches from Q1 to Q3, with a line inside marking the median. The "whiskers" typically extend to the minimum and maximum values, unless we use the IQR to identify potential outliers[2].

A Practical Application: Detecting Outliers with the IQR Rule

One of the most important uses of the IQR is to flag potential outliers in data. Outliers are values that are unusually far from the main cluster of data. They can be caused by measurement errors, data entry mistakes, or genuine rare events. The IQR provides a simple, objective rule to identify them.

Outlier Detection Rule (The 1.5 x IQR Rule):
• A data point is considered a mild outlier if it is below $ Q_1 - 1.5 \times IQR $ or above $ Q_3 + 1.5 \times IQR $.
• A data point is considered an extreme outlier if it is below $ Q_1 - 3 \times IQR $ or above $ Q_3 + 3 \times IQR $.

Real-World Scenario: Science Lab Measurements
A class measures the boiling point of a liquid (in °C) ten times: 101.2, 101.3, 101.5, 101.5, 101.6, 101.7, 101.8, 101.9, 102.0, 104.5. The value 104.5 seems high. Is it an outlier?

  1. Ordered Data: 101.2, 101.3, 101.5, 101.5, 101.6, 101.7, 101.8, 101.9, 102.0, 104.5
  2. Q2 (Median) = $ (101.6 + 101.7) / 2 = 101.65 $
  3. Q1 = median of first half (101.2, 101.3, 101.5, 101.5, 101.6) = 101.5
  4. Q3 = median of second half (101.7, 101.8, 101.9, 102.0, 104.5) = 101.9
  5. IQR = $ 101.9 - 101.5 = 0.4 $
  6. Lower Bound = $ Q_1 - 1.5 \times IQR = 101.5 - (1.5 \times 0.4) = 101.5 - 0.6 = 100.9 $
  7. Upper Bound = $ Q_3 + 1.5 \times IQR = 101.9 + (1.5 \times 0.4) = 101.9 + 0.6 = 102.5 $

Any value below 100.9 or above 102.5 is a potential outlier. Our measurement 104.5 is far above 102.5, confirming it is an outlier. The lab group should investigate this reading for possible error.

IQR vs. Range: Choosing the Right Measure of Spread

Why use IQR when we already have the simple range (Max - Min)? The key difference is resistance to outliers.

Consider two neighborhoods' house prices (in thousands):
Neighborhood A: 180, 200, 220, 240, 260 (Range = 80, IQR = 40)
Neighborhood B: 180, 200, 220, 240, 1000 (Range = 820, IQR = 40)

The range for Neighborhood B is huge (820) because of one mansion priced at 1,000,000. It misrepresents the spread of typical houses. The IQR for both is 40, correctly showing that the middle 50% of houses in both areas have similar price spreads. The IQR is a resistant statistic; it is not influenced by extreme values.

Important Questions

Q1: Why is the Interquartile Range more useful than the range for describing data spread?
The IQR is more useful because it focuses on the middle 50% of the data, ignoring extreme values (outliers). This makes it a resistant measure, giving a more reliable and consistent picture of the spread of the "typical" data. The range, which is simply Max - Min, can be drastically changed by a single very high or very low value, making it a less trustworthy measure of spread for many real-world datasets.
Q2: How do you find quartiles when the median falls on an existing data point in an odd-numbered dataset?
When the dataset has an odd number of values, the median (Q2) is one of the actual data points. To find Q1, you take the lower half of the data excluding the median itself. Similarly, for Q3, you take the upper half excluding the median. For example, in the dataset [2, 4, 6, 8, 10], the median is 6. The lower half (excluding 6) is [2, 4], so Q1 = 3. The upper half is [8, 10], so Q3 = 9.
Q3: Can the IQR be zero? What would that mean?
Yes, the IQR can be zero. This happens when Q1 and Q3 are the same value. It means that the middle 50% of your data are all identical. For instance, in the dataset [5, 10, 10, 10, 10, 10, 15], Q1=10, Q3=10, so IQR=0. This indicates no variability in the central portion of your data. In a box plot, the "box" would have no width.
Conclusion
The Interquartile Range is a powerful, intuitive concept that sits at the heart of exploratory data analysis. By measuring the spread of the middle 50% of a dataset, it provides a clear, outlier-resistant view of variability. From calculating it via quartiles to visualizing it in a box plot and using it to detect unusual values, the IQR is an indispensable tool for students and scientists alike. It moves beyond the simplistic range to offer a deeper understanding of what data is really telling us about the world, whether it's exam scores, house prices, or laboratory measurements. Mastering the IQR builds a strong foundation for all future statistical learning.

Footnote

[1] Range: In statistics, the range is the simplest measure of dispersion, calculated as the difference between the maximum and minimum values in a dataset.
[2] Outliers: Data points that differ significantly from other observations. They may be due to variability in the measurement or may indicate experimental errors.

Did you like this article?

home
grid_view
add
explore
account_circle