WebmasterID logoWebmasterID
Conversion & funnels

Simpson’s paradox in experiments

Simpson's paradox is when an effect that holds within every subgroup reverses or vanishes once the subgroups are pooled. In experiments it appears when the mix of traffic differs between arms — so the aggregate is driven by composition, not the change. It is a vivid reason to check segments and to ensure arms are comparable. This page explains how it arises and how to avoid being fooled.

Partially verified

How the reversal happens

Simpson's paradox arises from a lurking variable that influences both group membership and the outcome. If one arm happens to draw more high-converting traffic (say more returning visitors), it can look better overall even if the change itself helped no one — the aggregate reflects who was in each arm, not what the change did.

Why randomisation usually prevents it

Proper random assignment makes the traffic mix statistically the same in each arm, so composition cannot drive the result. The paradox tends to surface when assignment is broken — biased redirects, bot filtering that hits arms unevenly, or post-hoc segment slicing — which is also the signature of sample ratio mismatch.

Reading segments without being fooled

The cure is not to pick whichever number you like. Check whether the arms are balanced; if they are, trust the overall result. If they are not, the imbalance itself is the bug to fix before drawing any conclusion — neither the pooled nor the sliced number is trustworthy until arms are comparable.

How it appears in analytics and logs

A variant that wins in every segment yet loses overall signals that the arms have different audience mixes — the aggregate is a weighting artefact, not the true effect.

Diagnostic use case

When an overall result conflicts with consistent per-segment results, check whether the traffic mix differs between arms before trusting either number.

What WebmasterID can help detect

WebmasterID's segmentation over first-party events lets you compare results within and across segments, so a composition-driven reversal is visible rather than hidden in the total.

Common mistakes

Privacy and accuracy notes

Detecting the paradox uses aggregate counts per segment, not individual records. No personal identifiers are required to spot it.

Related pages

Sources and verification notes

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.