Simpson's paradox

Simpson's paradox is a paradox from statistics. It is named after Edward H. Simpson, a British statistician who first described it in 1951.^[1] The statistician Karl Pearson described a very similar effect in 1899.^[2]- Udny Yule's description dates from 1903.^[3] Sometimes, it is called the Yule–Simpson effect. When looking at the statistical scores of groups, these scores may change, depending on whether the groups are looked at one by one, or if they are combined into a larger group. This case often occurs in social sciences and medical statistics.^[4] It may confuse people, if frequency data is used to explain a causal relationship.^[5] Other names for the paradox include reversal paradox and amalgamation paradox.^[6]

Example: Kidney stone treatment

This is a real-life example from a medical study^[7] comparing the success rates of two treatments for kidney stones.^[8]

The table shows the success rates and numbers of treatments for treatments involving both small and large kidney stones, where Treatment A includes all open procedures and Treatment B is percutaneous nephrolithotomy:

	Treatment A		Treatment B
	success	failure	success	failure
Small Stones	Group 1		Group 2
number of patients	81	6	234	36
	93%	7%	87%	13%
Large Stones	Group 3		Group 4
number of patients	192	71	55	25
	73%	27%	69%	31%
Both	Group 1+3		Group 2+4
number of patients	273	77	289	61
	78%	22%	83%	17%

The paradoxical conclusion is that treatment A is more effective when used on small stones, and also when used on large stones, yet treatment B is more effective when considering both sizes at the same time. In this example, it was not known that the size of the kidney stone influenced the result. This is called a hidden variable (or lurking variable) in statistics.

Which treatment is considered better is determined by an inequality between two ratios (successes/total). The reversal of the inequality between the ratios, which creates Simpson's paradox, happens because two effects occur together:

The sizes of the groups, which are combined when the lurking variable is ignored, are very different. Doctors tend to give the severe cases (large stones) the better treatment (A), and the milder cases (small stones) the inferior treatment (B). Therefore, the totals are dominated by groups three and two, and not by the two much smaller groups one and four.
The lurking variable has a large effect on the ratios, i.e. the success rate is more strongly influenced by the severity of the case than by the choice of treatment. Therefore, the group of patients with large stones using treatment A (group three) does worse than the group with small stones, even if the latter used the inferior treatment B (group two).

References

↑ Simpson, Edward H. (1951). "The Interpretation of Interaction in Contingency Tables". Journal of the Royal Statistical Society, Ser. B. 13: 238–241.
↑ Pearson, Karl; Lee, A.; Bramley-Moore, L. (1899). "Genetic (reproductive) selection: Inheritance of fertility in man". Philosophical Translations of the Royal Statistical Society, Ser. A. 173: 534–539.
↑ G. U. Yule (1903). "Notes on the Theory of Association of Attributes in Statistics". Biometrika. 2 (2): 121–134. doi:10.1093/biomet/2.2.121.
↑ Clifford H. Wagner (February 1982). "Simpson's Paradox in Real Life". The American Statistician. 36 (1): 46–48. doi:10.2307/2684093. JSTOR 2684093.
↑ Judea Pearl. Causality: Models, Reasoning, and Inference, Cambridge University Press (2000, 2nd edition 2009). ISBN 0-521-77362-8.
↑ I. J. Good, Y. Mittal (June 1987). "The Amalgamation and Geometry of Two-by-Two Contingency Tables". The Annals of Statistics. 15 (2): 694–711. doi:10.1214/aos/1176350369. ISSN 0090-5364. JSTOR 2241334.
↑ C. R. Charig; D. R. Webb; S. R. Payne; O. E. Wickham (29 March 1986). "Comparison of treatment of renal calculi by open surgery, percutaneous nephrolithotomy, and extracorporeal shockwave lithotripsy". Br Med J (Clin Res Ed). 292 (6524): 879–882. doi:10.1136/bmj.292.6524.879. PMC 1339981. PMID 3083922.
↑ Steven A. Julious and Mark A. Mullee (1994-12-03). "Confounding and Simpson's paradox". BMJ. 309 (6967): 1480–1481. doi:10.1136/bmj.309.6967.1480. PMC 2541623. PMID 7804052.

[1] Simpson, Edward H. (1951). "The Interpretation of Interaction in Contingency Tables". Journal of the Royal Statistical Society, Ser. B. 13: 238–241.

[2] Pearson, Karl; Lee, A.; Bramley-Moore, L. (1899). "Genetic (reproductive) selection: Inheritance of fertility in man". Philosophical Translations of the Royal Statistical Society, Ser. A. 173: 534–539.

[3] G. U. Yule (1903). "Notes on the Theory of Association of Attributes in Statistics". Biometrika. 2 (2): 121–134. doi:10.1093/biomet/2.2.121.

[4] Clifford H. Wagner (February 1982). "Simpson's Paradox in Real Life". The American Statistician. 36 (1): 46–48. doi:10.2307/2684093. JSTOR 2684093.

[pearl-5] Judea Pearl. Causality: Models, Reasoning, and Inference, Cambridge University Press (2000, 2nd edition 2009). ISBN 0-521-77362-8.

[6] I. J. Good, Y. Mittal (June 1987). "The Amalgamation and Geometry of Two-by-Two Contingency Tables". The Annals of Statistics. 15 (2): 694–711. doi:10.1214/aos/1176350369. ISSN 0090-5364. JSTOR 2241334.

[7] C. R. Charig; D. R. Webb; S. R. Payne; O. E. Wickham (29 March 1986). "Comparison of treatment of renal calculi by open surgery, percutaneous nephrolithotomy, and extracorporeal shockwave lithotripsy". Br Med J (Clin Res Ed). 292 (6524): 879–882. doi:10.1136/bmj.292.6524.879. PMC 1339981. PMID 3083922.

[8] Steven A. Julious and Mark A. Mullee (1994-12-03). "Confounding and Simpson's paradox". BMJ. 309 (6967): 1480–1481. doi:10.1136/bmj.309.6967.1480. PMC 2541623. PMID 7804052.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]