Regression to the mean in tests
Regression to the mean is the statistical tendency for an extreme measurement to be closer to the average on the next observation. In experimentation it explains why a page picked because it converted unusually well often 'declines' afterward, and why early test readings overstate effects. Recognising it prevents crediting a change for a return to normal. This page explains the mechanism.
Why extremes don't repeat
Any measurement mixes a stable signal with random noise. An unusually high reading is more likely to have caught a positive noise burst, so the next reading — with fresh, independent noise — tends to land closer to the true average. The extreme regresses toward the mean without anything causal happening.
How it fools experimenters
Selecting a target because it is extreme is the trap. Pick the worst-converting page, change it, and it 'improves' partly because it was due to regress anyway. Early in a test, a variant can look spectacular for the same reason — small samples produce extreme estimates that shrink as data accumulates.
- Extreme readings catch noise that won't recur
- Selecting on an extreme guarantees apparent change
- Small samples exaggerate effect estimates
Guarding against it
A control group is the defence: if both the treated and untreated groups drift back toward average, the drift is regression, not your change. This is exactly why a randomised A/B comparison beats before-and-after on a single hand-picked page.
How it appears in analytics and logs
A standout metric that 'normalises' later may simply be regressing, not responding to anything you did. Early extreme test readings usually shrink with more data.
Diagnostic use case
Be sceptical when you optimise the worst- or best-performing page or variant: some of the subsequent change is regression to the mean, not your intervention.
What WebmasterID can help detect
WebmasterID's consistent first-party time series let you watch a metric over multiple periods, so you can tell a real shift from a reading that is merely regressing toward its average.
Common mistakes
- Crediting a fix for a metric that was going to regress anyway.
- Trusting a variant's spectacular early-sample reading.
- Using before/after on a hand-picked extreme instead of a control.
Privacy and accuracy notes
Regression to the mean is a property of aggregate measurements. Detecting it needs no personal data, only repeated readings of the same metric.
Related pages
- Control and variant in experiments
In an experiment the control is the existing version that acts as the baseline, and the variant is the version carrying the one change you are testing. Comparing the two only yields a clean answer when assignment is random and the variant differs from the control in exactly one way. Multiple variants are possible but each must be isolated.
- Novelty and primacy effects
Novelty and primacy effects are transient behavioural responses to change that distort early experiment readings. Novelty effect: a new design draws clicks just because it is new, and the lift fades. Primacy effect: regular users resist a change they are accustomed to, so a good variant looks worse at first. Both mean the first days of a test may not reflect the steady state.
- Sample size in experiments
Sample size is the number of subjects per arm an experiment needs to detect a chosen effect with acceptable error rates. It is computed in advance from the baseline rate, the minimum effect worth detecting, and the false-positive and false-negative rates you accept. Too small and you miss real effects; running until 'it looks good' inflates false positives.
- How long to run an A/B test
An A/B test runs until it has collected the sample size its design requires — derived from the baseline rate, the minimum detectable effect, and the chosen power. Duration also has to span full business cycles (weekday/weekend) to avoid day-of-week bias. Stopping the moment a result looks significant inflates false positives. This page explains how duration is set honestly.
Sources and verification notes
- Wikipedia — Regression toward the meanMechanism and selection-on-extremes trap.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.