Conversion & funnels

Pitfalls of segmenting test results

Segmenting experiment results — by device, country, source — is useful, but slicing a non-significant test until some segment 'wins' is a recipe for false positives. Each extra segment is another comparison; enough slices guarantee a spurious hit. Legitimate segment analysis is pre-planned or corrected for multiplicity. This page separates honest segmentation from data dredging.

Partially verified

Every slice is a new test

When an overall result is flat, splitting it into segments multiplies the number of comparisons. With enough segments, ordinary noise will produce at least one that crosses the significance line. Reporting that one as 'the test worked for mobile users' is data dredging, not a finding.

Honest segmentation

Segment analysis is legitimate when the segments are specified before the data is seen, or when you correct for the number of comparisons made. A pre-registered hypothesis that a change helps mobile specifically is testable; a hunt through twenty segments for any winner is not. The difference is whether the segment was a prediction or a discovery.

Pre-register the segments you will examine
Correct for multiplicity when slicing many ways
Retest post-hoc segment wins before believing them

Spurious segments waste roadmap

Acting on a false segment win is costly twice: you ship something that doesn't help, and you may build a personalisation strategy on a mirage. Treating surprising segment results as hypotheses for a fresh, powered test is the discipline that keeps segmentation trustworthy.

How it appears in analytics and logs

A 'win' that appears only in one unplanned segment of an otherwise flat test is most likely a multiple-comparisons artefact, not a real subgroup effect.

Diagnostic use case

Treat post-hoc segment wins as hypotheses to retest, not conclusions; pre-register the segments you care about or apply a multiple-comparisons correction.

What WebmasterID can help detect

WebmasterID's first-party segments let you analyse subgroups you defined in advance, keeping segment analysis a planned step rather than an after-the-fact fishing expedition.

Common mistakes

Slicing a flat test until some segment crosses significance.
Reporting an unplanned segment win as a real effect.
Building personalisation on an unreplicated subgroup result.

Privacy and accuracy notes

Segmenting uses coarse first-party dimensions over aggregate counts. It needs no individual-level identifiers to be useful.

↑ All conversion topics in Conversion & funnels

Sources and verification notes

Wikipedia — Data dredgingPost-hoc subgroup hunting inflates false positives.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.