Conversion & funnels

Multiple comparisons correction

When you run many tests at once — multiple variants, multiple metrics, many segments — the chance that at least one shows a false positive grows with the number of comparisons. Multiple-comparisons corrections counter this: the Bonferroni method controls the family-wise error rate by dividing α across tests, while the Benjamini-Hochberg procedure controls the false discovery rate, trading some power for fewer false 'wins'.

Verified against primary sources

Why many tests inflate false positives

With one test at α = 0.05 there is a 5% chance of a false positive. Run 20 independent comparisons and the chance that at least one is a false positive rises toward 1 − 0.95^20 ≈ 64%. This is the multiple-comparisons problem: the more questions you ask of the same data, the more likely pure noise produces a 'winner' somewhere.

Family-wise error grows with the number of comparisons
Comes from multiple arms, metrics, or segment slices
An uncorrected p exaggerates confidence in any one find

Two correction families

Bonferroni controls the family-wise error rate by testing each comparison at α/m for m comparisons — simple and conservative, it can sacrifice power when m is large. Benjamini-Hochberg instead controls the false discovery rate (the expected fraction of declared 'discoveries' that are false), which is less strict and keeps more power, suitable when some false positives are tolerable. Pre-register how many comparisons you will make so the correction is honest.

Deciding which guardrail and primary metrics count up front limits m.

How it appears in analytics and logs

An uncorrected 'significant' result picked from many comparisons is much more likely to be a false positive than its nominal p-value suggests.

Diagnostic use case

Apply a correction whenever a single experiment evaluates several arms, metrics, or segments, so the overall false-positive rate stays at the level you intended.

What WebmasterID can help detect

WebmasterID supplies the per-arm and per-segment conversion counts; applying a correction across them is your analysis choice.

Common mistakes

Slicing into many segments and reporting the one that 'won' without correction.
Adding metrics mid-test without accounting for the extra comparisons.
Using Bonferroni on hundreds of tests and killing all power.

Privacy and accuracy notes

Corrections operate on aggregate test statistics, not individuals. No personal data is required.

↑ All conversion topics in Conversion & funnels

Sources and verification notes

NIST/SEMATECH e-Handbook — multiple comparisons

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.