Conversion & funnels

Feature flags and experiments

A feature flag is a runtime switch that turns functionality on or off for chosen users without a new deploy. Flags power gradual rollouts, kill switches, and — when the audience is split randomly and outcomes are measured — controlled experiments. Understanding the overlap keeps you from confusing a rollout (operational) with an experiment (measured comparison).

Verified against primary sources

What this means

A feature flag (feature toggle) decouples deploying code from releasing behaviour: the code ships dark and a flag decides who sees it at runtime. Flags serve several jobs — gradual percentage rollouts, instant kill switches, targeting specific segments, and experimentation. When the flag assigns users randomly and you compare a metric between the on and off groups, the flag is delivering an A/B test.

Rollout versus experiment

A rollout and an experiment can use identical flag plumbing but answer different questions. A rollout asks 'can we safely turn this on for everyone?' and ramps the percentage while watching for breakage. An experiment asks 'does this change the metric versus not having it?' and requires random assignment, a control group, a pre-declared metric, and enough sample for a valid comparison.

Conflating them is a common error: ramping a flag to 100% because nothing broke is not evidence the change improved anything. Only the measured comparison gives that.

Flag = runtime switch, decoupled from deploy
Rollout: ramp safely; experiment: measure vs control
Experiment needs random assignment and a metric

How it appears in analytics and logs

A flag that is rolled out to a growing percentage is operational delivery. The same flag with random assignment and a measured outcome against a held-back group is a controlled experiment — the difference is the analysis, not the switch.

Diagnostic use case

Use flags to ship safely and to deliver experiment variants, but only call it an experiment when assignment is random and a metric is compared with a control.

What WebmasterID can help detect

WebmasterID measures the first-party events that tell you what each flagged cohort did, which is the data an experiment built on flags needs to be evaluated.

Common mistakes

Treating a successful rollout as proof a change worked.
Skipping random assignment when a flag delivers a variant.
Leaving stale flags in place, muddying later analysis.

Privacy and accuracy notes

Flag assignment and experiment analysis rely on aggregate cohorts, not personal profiling. This page is educational.

↑ All conversion topics in Conversion & funnels

Sources and verification notes

Martin Fowler — Feature Toggles (Flags)

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.