Novelty and primacy effects
Novelty and primacy effects are transient behavioural responses to change that distort early experiment readings. Novelty effect: a new design draws clicks just because it is new, and the lift fades. Primacy effect: regular users resist a change they are accustomed to, so a good variant looks worse at first. Both mean the first days of a test may not reflect the steady state.
What this means
Novelty effect is the tendency of users to engage with something just because it is unfamiliar — a redesigned button gets extra clicks at launch that have nothing to do with it being better. Primacy effect is the opposite for habituated users: people who know the old flow are briefly slower or more reluctant with the new one, depressing its early numbers. Both are about adjustment to change, not the change's true merit.
How to avoid being fooled
Because these effects fade, the cure is mostly patience and segmentation. Let the experiment run past the adjustment window so the trend can stabilise, and look at the metric over time rather than as a single aggregate. Splitting results by new versus returning visitors helps: new users never knew the old version, so their behaviour is cleaner of primacy, while a novelty bump tends to shrink across the run.
If the effect is still present once trends flatten and across user segments, it is more likely to be real.
- Novelty: new draws clicks that fade
- Primacy: habituated users resist change at first
- Segment new vs returning and watch the trend stabilise
How it appears in analytics and logs
An early lift that decays over the test, or an early dip that recovers, is a classic sign of novelty or primacy. The steady-state segment of users tells you the real effect better than day one.
Diagnostic use case
Run experiments long enough, and segment new versus returning users, so that a temporary novelty bump or primacy dip is not mistaken for the durable effect.
What WebmasterID can help detect
WebmasterID measures first-party engagement over time and can distinguish new from returning visitors, which is how you separate a novelty spike from a durable change.
Common mistakes
- Calling a test after one or two days of novelty lift.
- Killing a good variant during the primacy dip.
- Reading only the aggregate, never the over-time trend.
Privacy and accuracy notes
Detecting these effects uses aggregate trends and new-vs-returning segments, not personal identification. This page is educational.
Related pages
- A/B testing fundamentals
An A/B test randomly assigns visitors to a control (A) or a variant (B), shows each group one version, and compares a pre-chosen metric. Random assignment is what lets you attribute a difference to the change rather than to who happened to see it. The discipline is in deciding the metric and sample size before you start, not after you peek at the numbers.
- The peeking problem in A/B tests
The peeking problem is checking an experiment over and over and stopping the moment it crosses significance. Because each look is another chance for noise to cross the threshold, repeated peeking inflates the false-positive rate well above the nominal level. The fixes are a pre-set sample size or a sequential method designed for continuous monitoring.
- Segmentation for conversion analysis
Segmentation divides visitors into groups — by source, device, geography, or behaviour — so you can compare conversion within comparable cohorts. A single blended conversion rate can hide that one segment converts well and another barely at all. The discipline is choosing segments that answer a question without slicing so finely that each group becomes noise.
- Sample size in experiments
Sample size is the number of subjects per arm an experiment needs to detect a chosen effect with acceptable error rates. It is computed in advance from the baseline rate, the minimum effect worth detecting, and the false-positive and false-negative rates you accept. Too small and you miss real effects; running until 'it looks good' inflates false positives.
Sources and verification notes
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.