A/B testing fundamentals
An A/B test randomly assigns visitors to a control (A) or a variant (B), shows each group one version, and compares a pre-chosen metric. Random assignment is what lets you attribute a difference to the change rather than to who happened to see it. The discipline is in deciding the metric and sample size before you start, not after you peek at the numbers.
What this means
In an A/B test you randomly split incoming visitors into two groups. Group A sees the current version (the control); group B sees a single changed version (the variant). You then compare one metric you chose beforehand. Random assignment makes the two groups comparable on average, so a difference in the metric can be attributed to the change.
Designing it honestly
Pick the metric and the sample size before launching, and change one thing at a time so a result has a clear cause. Let the test run to its planned size rather than stopping the moment it looks good — early stopping inflates false positives. Documenting the hypothesis up front keeps you from rationalising whatever the data happens to show.
A/B testing answers 'did this change move the metric', not 'is this the best possible design'. It is a measurement tool, not a substitute for judgement about what to try.
- Random assignment makes groups comparable
- Change one variable so the cause is clear
- Fix the metric and sample size before launch
How it appears in analytics and logs
A difference between A and B is only trustworthy if assignment was random, the variants differed by one thing, and the metric and stopping rule were set before launch. Otherwise the 'winner' may be noise or bias.
Diagnostic use case
Run an A/B test when you want a causal read on a single change — split traffic randomly, fix one metric, and decide the sample size in advance.
What WebmasterID can help detect
WebmasterID measures the conversion events that an experiment compares, first-party, so you can read variant outcomes without cross-site tracking.
Common mistakes
- Stopping the test the moment it looks significant.
- Changing several things at once so no cause is isolable.
- Choosing the winning metric after seeing the data.
Privacy and accuracy notes
An experiment needs only a random bucket assignment and aggregate metric counts, not personal identity. WebmasterID reads outcomes from first-party events.
Related pages
- Statistical significance and p-values
A result is 'statistically significant' when it would be unlikely if there were really no effect. The p-value is the probability of seeing data at least as extreme as yours assuming the null hypothesis is true — it is not the probability the variant is better, and not a measure of how big the effect is. Significance and practical importance are different questions.
- Control and variant in experiments
In an experiment the control is the existing version that acts as the baseline, and the variant is the version carrying the one change you are testing. Comparing the two only yields a clean answer when assignment is random and the variant differs from the control in exactly one way. Multiple variants are possible but each must be isolated.
- Sample size in experiments
Sample size is the number of subjects per arm an experiment needs to detect a chosen effect with acceptable error rates. It is computed in advance from the baseline rate, the minimum effect worth detecting, and the false-positive and false-negative rates you accept. Too small and you miss real effects; running until 'it looks good' inflates false positives.
- Event Explorer
Compare conversion events across variants.
Sources and verification notes
- Google — Optimize / experiments concepts (A/B tests)Optimize is sunset; the A/B-test concept documentation remains a primary reference.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.