Conversion & funnels

How long to run an A/B test

An A/B test runs until it has collected the sample size its design requires — derived from the baseline rate, the minimum detectable effect, and the chosen power. Duration also has to span full business cycles (weekday/weekend) to avoid day-of-week bias. Stopping the moment a result looks significant inflates false positives. This page explains how duration is set honestly.

Partially verified

Duration follows from sample size

You do not pick a duration; you pick a sample size and let traffic determine how long collecting it takes. The sample size comes from the baseline conversion rate, the minimum detectable effect you care about, and your power and significance targets. Daily eligible traffic then converts that sample into a number of days.

Cover full business cycles

Even after hitting the sample target, run across complete weekly cycles. Conversion behaviour differs by day of week and by pay cycle; a test that runs Tuesday to Thursday can be biased by who shows up midweek. Whole-week multiples reduce day-of-week confounding.

Sample size sets the floor on data needed
Run whole weeks to absorb day-of-week effects
Do not stop the instant a peek looks significant

The early-stopping trap

Repeatedly checking significance and stopping at the first 'win' is the peeking problem: it dramatically raises the false-positive rate above the nominal threshold. If you need the option to stop early, use a method designed for it — sequential testing or a Bayesian approach — rather than fixed-horizon tests read continuously.

How it appears in analytics and logs

A test that 'reached significance' on day two has almost certainly been peeked into a false positive. Short tests also miss weekly seasonality.

Diagnostic use case

Compute the required sample size first, then divide by daily eligible traffic to estimate duration; run at least one full weekly cycle and resist stopping on an early peek.

What WebmasterID can help detect

WebmasterID's first-party traffic counts let you estimate eligible daily volume realistically, so the duration you plan reflects the audience you actually have.

Common mistakes

Stopping as soon as p dips below the threshold.
Running only a few days and missing weekly seasonality.
Picking a duration before computing the needed sample size.

Privacy and accuracy notes

Duration is a function of aggregate counts and traffic. Planning it needs no personal data — only the baseline rate and traffic volume.

↑ All conversion topics in Conversion & funnels

Sources and verification notes

Wikipedia — Sequential analysisWhy fixed-horizon tests should not be peeked.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.