Regression toward the mean
Regression toward the mean simply means that, following an extreme random event, the next random event is likely to be less extreme. Regression toward the mean was first described by Francis Galton. He found that offspring of tall parents tended to be shorter. Also, offspring of shorter parents tended to be taller. Galton stated that processes that did not follow regression towards the mean would quickly go out of control.
In 1886, Galton published a paper called Regression towards mediocrity in hereditary stature. In the paper, he observed that extreme characteristics (e.g., height) in parents are not passed on completely to their offspring. Rather, the characteristics in the offspring regress towards a mediocre point. Today, this point is called the mean. By measuring the heights of hundreds of people, he was able to quantify regression to the mean, and estimate the size of the effect. Galton wrote that, “the average regression of the offspring is a constant fraction of their respective mid-parental deviations”. This means that the difference between a child and its parents for some characteristic is proportional to its parents' deviation from typical people in the population. If its parents are each two inches taller than the averages for men and women, on average, the child will be shorter than its parents by some factor times two inches. Today, this factor has been calculated to be one minus the regression coefficient. For height, Galton estimated this coefficient to be about two thirds: the height of an individual will measure around a midpoint that is two thirds of the parents’ deviation from the population average.
Galton used the term regression to describe an observable fact in the inheritance of multi-factorial quantitative genetic traits: namely that the offspring of parents who lie at the tails of the distribution will tend to lie closer to the centre, the mean, of the distribution. He quantified this trend, and in doing so invented linear regression analysis. This is the starting point for much of modern statistical modelling. Since then, the term "regression" has taken on different meanings, and it may be used by modern statisticians to describe phenomena of sampling bias which have little to do with Galton's original observations in the field of genetics.
Galton's explanation for the regression phenomenon he observed is now known to be incorrect. He stated: “A child inherits partly from his parents, partly from his ancestors. Speaking generally, the further his genealogy goes back, the more numerous and varied will his ancestry become, until they cease to differ from any equally numerous sample taken at haphazard from the race at large.” This is incorrect, since a child receives its genetic makeup exclusively from its parents. There is no generation-skipping in genetic material: any genetic material from earlier ancestors than the parents must have passed through the parents. The phenomenon is better understood if we assume that the inherited trait (e.g., height) is controlled by a large number of recessive genes. Exceptionally tall individuals must be homozygous for increased height mutations on a large proportion of these loci. But the loci which carry these mutations are not necessarily shared between two tall individuals, and if these individuals mate, their offspring will be on average homozygous for "tall" mutations on fewer loci than either of their parents. In addition, height is not entirely genetically determined, but also subject to environmental influences during development, which make offspring of exceptional parents even more likely to be closer to the average than their parents.
In sharp contrast to this population genetic phenomenon of regression to the mean, which is best thought of as a combination of a binomially distributed process of inheritance (plus normally distributed environmental influences), the term "regression to the mean" is now often used to describe completely different phenomena in which an initial sampling bias may disappear as new, repeated, or larger samples display sample means that are closer to the true underlying population mean.
- Leonard Mlodinow, The Drunkard's Walk: How Randomness Rules Our Lives (New York: Pantheon Books, 2008), p. 161
- Mitchell H Gail, Encyclopedia of Epidemiologic Methods (Chichester; New York: Wiley, 2000), p. 110
- Galton, F. (1886). "Regression towards mediocrity in hereditary stature". The Journal of the Anthropological Institute of Great Britain and Ireland 15: 246–263. doi:10.2307/2841583.