Question 1

What is the IQR method for detecting outliers?

Accepted Answer

The Interquartile Range (IQR) method uses the spread of the middle 50% of data. First, calculate Q1 (25th percentile) and Q3 (75th percentile). Then IQR = Q3 − Q1. The lower fence is Q1 − 1.5 × IQR and the upper fence is Q3 + 1.5 × IQR. Any value outside these fences is flagged as an outlier. The 1.5 multiplier (Tukey's rule) is standard: for a normal distribution, it flags about 0.7% of data. Some analyses use 3 × IQR (extreme outliers) for a more conservative threshold. The IQR method is robust because it is based on the median and quartiles, not the mean, so existing outliers do not inflate the threshold.

Question 2

What is the Z-score method and when should I use it?

Accepted Answer

The Z-score measures how many standard deviations a value is from the mean: z = (x − mean) / std. Values with |z| > 3 are typically flagged as outliers (about 0.3% of a normal distribution). More conservative thresholds of |z| > 2.5 (about 1.2%) or |z| > 2 (about 5%) can be used for smaller datasets. The Z-score method assumes the data is approximately normally distributed. If the dataset itself contains outliers, they inflate the mean and standard deviation, making the Z-score less effective at detecting them. For contaminated datasets, use the Modified Z-score instead.

Question 3

What is the Modified Z-score and why is it more robust?

Accepted Answer

The Modified Z-score uses the median and MAD (Median Absolute Deviation) instead of the mean and standard deviation: M = 0.6745 × (x − median) / MAD. The factor 0.6745 makes the result comparable to standard Z-scores for normal distributions (MAD ≈ 0.6745 × std for a normal distribution). Values with |M| > 3.5 are flagged. Because it uses the median and MAD, it is highly resistant to the influence of existing outliers — the median doesn't change when you add extreme values. This method is recommended by Iglewicz and Hoaglin and is especially useful for small to medium datasets.

Question 4

What is a consensus outlier and should I always remove outliers?

Accepted Answer

A consensus outlier is a value flagged by two or more detection methods simultaneously. Using multiple methods reduces false positives: a value that only one method flags might be a legitimate extreme value, while a value flagged by IQR, Z-score, and Modified Z-score is very likely a genuine anomaly or data error. You should not automatically remove outliers. First investigate their cause: measurement errors and data entry mistakes should be corrected or removed, but genuine extreme values in the domain (a rare large sale, a peak sensor reading) should be kept. Removing them blindly can distort analysis and hide important phenomena.

Question 5

How do box plots visualize outlier detection?

Accepted Answer

A box plot (box-and-whisker plot) visualizes five statistics: minimum, Q1, median, Q3, and maximum. The box spans Q1 to Q3 (the IQR). The whiskers extend from Q1 to the lowest value within 1.5 × IQR, and from Q3 to the highest value within 1.5 × IQR. Points beyond the whiskers are plotted individually as outlier dots. The strip plot below the box shows every data point along the same axis. Outlier dots are colored red (consensus outliers) so you can immediately see their magnitude and distribution relative to the bulk of the data.

Method	Based on	Threshold	Assumes normality	Robust to outliers
IQR	Q1, Q3, IQR	1.5 × IQR	No	Yes
Z-Score	Mean, Std Dev	\|z\| > 2/2.5/3	Yes	No
Modified Z-Score	Median, MAD	\|M\| > 3.5	Approximately	Yes

Outlier Detection Tool

Outlier Detection Methods Compared

When to Remove Outliers vs Keep Them

Summary Statistics Reference

Frequently Asked Questions

What is the IQR method for detecting outliers?

What is the Z-score method and when should I use it?

What is the Modified Z-score and why is it more robust?

What is a consensus outlier and should I always remove outliers?

How do box plots visualize outlier detection?

Related Tools