Conversion & funnel reference: goals, funnels, and CRO
A reference to conversion and funnel analytics. Each page explains a concept — goals, funnels, experiment design, cohorts, retention, churn — how it is measured, the statistical and practical pitfalls, and how to act on it. No fabricated uplift numbers.
124 conversion topics documented · part of the Web Crawler & Traffic Intelligence Encyclopedia.
- Conversion rate: definition and denominators
Conversion rate is the share of some base that converted. The trap is the denominator: conversions per session, per user, and per unique visitor give different numbers and mean different things. Without stating the base, a conversion rate is ambiguous — and comparing rates with different bases is meaningless.
- Funnel analysis: finding the leak
Funnel analysis follows visitors through an ordered set of steps (view → add to cart → checkout → purchase) and shows where they fall out. It turns a single conversion rate into a map of where the loss happens. The pitfalls are step definition, small-sample noise, and assuming a strict order where users actually skip around.
- A/B testing fundamentals
An A/B test randomly assigns visitors to a control (A) or a variant (B), shows each group one version, and compares a pre-chosen metric. Random assignment is what lets you attribute a difference to the change rather than to who happened to see it. The discipline is in deciding the metric and sample size before you start, not after you peek at the numbers.
- Statistical significance and p-values
A result is 'statistically significant' when it would be unlikely if there were really no effect. The p-value is the probability of seeing data at least as extreme as yours assuming the null hypothesis is true — it is not the probability the variant is better, and not a measure of how big the effect is. Significance and practical importance are different questions.
- Cohort analysis
A cohort is a group of users who share a starting event — the week they first visited, the month they signed up. Cohort analysis follows each cohort over time so you can compare like with like. It separates 'are users behaving differently' from 'is the mix of users changing', which a single blended average can hide.
- Retention rate
Retention rate measures how many users from a starting cohort come back in a later period. It depends entirely on definitions: what counts as 'returning', over what window, and which cohort. A 7-day and a 30-day retention rate answer different questions, and neither is comparable to a churn figure computed a different way.
- Customer lifetime value (LTV)
Customer lifetime value (LTV or CLV) estimates the total revenue or margin a customer generates across their whole relationship. It is a forecast built on assumptions about retention, purchase frequency, and margin — not a measured number. Treated as fact it misleads; treated as a model with stated assumptions it guides acquisition spend.
- Cart abandonment
Cart abandonment happens when a visitor adds items to a cart but does not complete the purchase. The rate is usually one minus (purchases ÷ carts created). It is a useful friction signal, but it overstates 'lost sales' because many adds are research, comparison, or saving for later — not abandoned intent.
- Checkout abandonment vs cart abandonment
Checkout abandonment is when a shopper begins the checkout flow but does not complete the purchase. It is a tighter signal than cart abandonment because it counts people who showed stronger intent by entering checkout. Separating the two locates friction precisely: the cart step versus the payment and shipping steps.
- Average order value (AOV)
Average order value (AOV) is total revenue divided by the number of orders. It is simple but easy to misread: a few large orders pull the mean upward, refunds and taxes change what 'revenue' means, and mixing currencies without conversion corrupts it. For skewed order sizes, the median order value is often more honest.
- Churn rate
Churn rate measures how many customers (or how much recurring revenue) you lose in a period. Like retention, it is defined by choices: the window, what counts as 'churned', and whether you count customers or revenue. Customer churn and revenue churn can diverge sharply, so the basis must be stated.
- Control and variant in experiments
In an experiment the control is the existing version that acts as the baseline, and the variant is the version carrying the one change you are testing. Comparing the two only yields a clean answer when assignment is random and the variant differs from the control in exactly one way. Multiple variants are possible but each must be isolated.
- Multivariate testing
Multivariate testing (MVT) changes several elements simultaneously and tests their combinations, so it can reveal interactions between elements that separate A/B tests miss. The cost is traffic: the number of combinations grows quickly, so each gets a thin slice of visitors. MVT is worth it only when you have ample traffic and genuinely suspect interactions.
- Sample size in experiments
Sample size is the number of subjects per arm an experiment needs to detect a chosen effect with acceptable error rates. It is computed in advance from the baseline rate, the minimum effect worth detecting, and the false-positive and false-negative rates you accept. Too small and you miss real effects; running until 'it looks good' inflates false positives.
- Minimum detectable effect (MDE)
The minimum detectable effect (MDE) is the smallest change in your metric that an experiment is set up to detect reliably. It is an input you choose, not an output: a smaller MDE demands more traffic. Setting the MDE to the smallest difference that would actually matter to the business keeps experiments honestly sized.
- Confidence intervals for conversion metrics
A confidence interval gives a range of plausible values for a metric rather than a single point. A 95% confidence interval is constructed so that, over many repeats, that procedure captures the true value 95% of the time. Reporting an interval communicates uncertainty honestly — a conversion rate of 4% with a wide interval is a very different claim than a narrow one.
- The peeking problem in A/B tests
The peeking problem is checking an experiment over and over and stopping the moment it crosses significance. Because each look is another chance for noise to cross the threshold, repeated peeking inflates the false-positive rate well above the nominal level. The fixes are a pre-set sample size or a sequential method designed for continuous monitoring.
- Micro and macro conversions
A macro conversion is a primary business goal — a purchase, a signup. A micro conversion is a smaller, intermediate action that signals progress toward it, like viewing a product or starting a form. Tracking both gives a richer picture of the funnel, but only the macro conversion should be treated as the headline success metric.
- Goal completion and key events
A goal completion is recorded when a visitor performs an action you have defined as valuable, such as a purchase or signup. In modern tools you mark an event as a key event (a conversion) and each qualifying occurrence is counted. The traps are over-counting repeated actions, double-counting across sessions, and defining the goal so loosely it stops meaning success.
- Segmentation for conversion analysis
Segmentation divides visitors into groups — by source, device, geography, or behaviour — so you can compare conversion within comparable cohorts. A single blended conversion rate can hide that one segment converts well and another barely at all. The discipline is choosing segments that answer a question without slicing so finely that each group becomes noise.
- North star metric
A north star metric is the one measure a team chooses to represent the core value it delivers, used to align decisions. Its value is focus: a single shared metric stops teams optimising in different directions. Its risk is tunnel vision — any single metric can be gamed, so it needs guardrail metrics around it and a clear link to real value.
- Guardrail metrics in experiments
Guardrail metrics are the secondary measures you monitor during an experiment to make sure a change that improves the primary metric does not quietly damage something important — load time, retention, refunds, support load. They turn 'did the target go up' into the fuller question 'did the target go up without breaking anything'.
- Bayesian A/B testing
Bayesian A/B testing treats the conversion rate of each arm as an unknown with a probability distribution. It combines a prior belief with observed data to produce a posterior, from which you can state things like 'the probability that B beats A is high' and quantify the expected loss of choosing wrong. It is an alternative framing to the frequentist p-value, with different assumptions rather than a guarantee of more truth.
- Frequentist vs Bayesian experiment analysis
Frequentist and Bayesian are two coherent ways to analyse the same experiment data. Frequentist methods ask how likely the observed data is under a null hypothesis and report p-values and confidence intervals. Bayesian methods combine a prior with the data to report posterior probabilities and credible intervals. Each has assumptions and failure modes; neither is universally 'correct'.
- P-value misconceptions
The p-value is one of the most misread numbers in experimentation. It is the probability of seeing data at least as extreme as observed if the null hypothesis were true — not the probability the null is true, not the probability of a fluke, and not a measure of effect size. The American Statistical Association issued a formal statement listing exactly these misconceptions.
- Sample ratio mismatch (SRM)
Sample ratio mismatch (SRM) is when the observed allocation of users to experiment arms diverges from the planned ratio by more than chance allows — for example a 50/50 test that lands far from 50/50. It signals a bug in assignment, logging, or filtering, and a test with SRM should not be trusted regardless of how good the headline result looks.
- Sequential testing for experiments
Sequential testing is a family of statistical methods designed for repeated looks at accumulating data. Naive peeking at a fixed-horizon test inflates the false-positive rate; sequential methods such as always-valid p-values and group sequential boundaries adjust for the multiple looks so you can monitor and stop early while keeping error control.
- Novelty and primacy effects
Novelty and primacy effects are transient behavioural responses to change that distort early experiment readings. Novelty effect: a new design draws clicks just because it is new, and the lift fades. Primacy effect: regular users resist a change they are accustomed to, so a good variant looks worse at first. Both mean the first days of a test may not reflect the steady state.
- Holdout groups
A holdout group is a randomly chosen set of users who are intentionally excluded from one or more shipped changes, so their behaviour serves as a long-run baseline. Where an A/B test measures one change briefly, a holdout measures the combined, sustained effect of everything launched, guarding against the slow accumulation of small regressions or overstated wins.
- Feature flags and experiments
A feature flag is a runtime switch that turns functionality on or off for chosen users without a new deploy. Flags power gradual rollouts, kill switches, and — when the audience is split randomly and outcomes are measured — controlled experiments. Understanding the overlap keeps you from confusing a rollout (operational) with an experiment (measured comparison).
- Path analysis
Path analysis (path exploration) visualises the real routes users take through a site as a branching tree of steps, rather than the single idealised funnel. Read forward from a starting point it shows where people actually go; read backward from a conversion or drop-off it shows what preceded it. It surfaces loops, detours, and unexpected entries a fixed funnel cannot.
- Drop-off analysis
Drop-off analysis measures, step by step, how many users fail to advance to the next stage of a funnel and where the largest losses occur. By isolating the single biggest leak it directs limited optimisation effort to the step with the most upside, instead of guessing or polishing stages that already convert well.
- Form analytics
Form analytics studies behaviour inside a form rather than just whether it was submitted. It tracks field-level signals such as time spent, corrections, validation errors, the field where users abandon, and completion rate. A page can have a known submit rate while form analytics reveals exactly which field is driving people away.
- Exit intent detection
Exit intent is a heuristic that predicts a visitor is about to leave the page, most often by detecting the mouse moving rapidly upward toward the address bar or close button. Sites use it to fire a final message such as an offer or reminder. It is a behavioural guess with clear limitations, especially on touch devices where there is no cursor to track.
- Session replay and privacy
Session replay reconstructs a visitor's interaction with a page — pointer movement, clicks, scrolls, input timing — into a playback. It can reveal usability friction a metric cannot, but it captures behaviour at a level that raises serious privacy duties: sensitive fields must be masked, consent may be required, and over-collection is a real risk. This page is educational, not legal advice.
- Heatmaps overview
A heatmap aggregates many users' interactions into a colour-coded overlay on a page: click maps show where people tap, scroll maps show how far down they read, and move maps show pointer movement. They are a quick qualitative read on attention and friction, but they aggregate away context and can mislead on responsive layouts and dynamic content.
- LTV-to-CAC ratio
The LTV-to-CAC ratio divides customer lifetime value by customer acquisition cost. It is a unit-economics gauge: a ratio comfortably above one suggests each customer returns more than they cost to win, while a ratio near or below one signals acquisition is not paying back. Both inputs are estimates, so the ratio is only as honest as the assumptions behind LTV and CAC.
- CAC payback period
The CAC payback period is the time required for the gross margin a customer generates to repay their acquisition cost. It complements the LTV-to-CAC ratio by adding the dimension of time: two businesses with the same ratio can have very different cash dynamics if one recovers its spend in months and the other in years.
- Activation rate
Activation rate measures the proportion of new users who complete a milestone representing first meaningful value — not merely signing up. Defining that milestone honestly is the crux: a good activation event predicts later retention, while a vanity definition flatters the number without reflecting whether users actually got value.
- Aha moment
The aha moment is the instant a new user first understands why a product is worth using — the realisation of core value. Teams try to identify it empirically by finding the early behaviour most associated with users who go on to retain, then design onboarding to reach that behaviour quickly. Guessing the moment without evidence steers onboarding toward the wrong target.
- Onboarding funnel
The onboarding funnel is the ordered path a new user takes from signing up to reaching first value (activation). Measuring drop-off at each step shows precisely where new users stall — an unclear setup screen, a permission prompt, an empty state with nothing to do — so onboarding can be improved at the step that loses the most people.
- Pirate metrics (AARRR)
Pirate metrics, or AARRR, is a lifecycle framework introduced by Dave McClure that groups growth metrics into five stages: Acquisition, Activation, Retention, Referral, and Revenue. It gives teams a shared map of where users are and where they leak, so attention can move from raw traffic to the stage actually constraining growth.
- Segmenting conversion by user attributes
Conversion segmentation splits an overall conversion rate by meaningful attributes — device type, traffic source, geography, new versus returning — instead of reading a single blended figure. A flat overall rate frequently masks a strong segment and a failing one; segmenting locates where conversion is actually won or lost, which Simpson's paradox shows can even reverse the aggregate story.
- Experiment roadmap and prioritization
An experiment roadmap is a prioritised backlog of test ideas, ordered so that limited testing capacity goes to the experiments most likely to teach or earn the most per unit of effort. Frameworks such as ICE (Impact, Confidence, Ease) and PIE (Potential, Importance, Ease) provide a structured score — useful for comparison, but built from subjective estimates that should not be mistaken for measured fact.
- Designing an experiment hypothesis
Before running an A/B test you write a hypothesis: a falsifiable statement linking a specific change to an expected effect on a named metric, for a defined audience, with a rationale. A good hypothesis fixes the success metric in advance, which prevents post-hoc metric shopping. This page covers the structure of a hypothesis and the reasoning behind it.
- Primary vs secondary metrics in tests
Every experiment should name a single primary metric that determines the decision, and a small set of secondary metrics that add context. The distinction matters statistically: testing many metrics inflates the chance one moves by luck, so the decision must rest on the pre-chosen primary. This page explains the roles and the multiple-comparisons risk.
- Traffic allocation in experiments
Traffic allocation decides what fraction of eligible users enter an experiment and how that fraction divides among variants. A 50/50 split between two arms maximises statistical power for a fixed sample; ramping exposure limits blast radius. Allocation is a deliberate trade-off between speed, risk, and the number of variants. This page explains the levers.
- How long to run an A/B test
An A/B test runs until it has collected the sample size its design requires — derived from the baseline rate, the minimum detectable effect, and the chosen power. Duration also has to span full business cycles (weekday/weekend) to avoid day-of-week bias. Stopping the moment a result looks significant inflates false positives. This page explains how duration is set honestly.
- Ramp-up and staged rollout
Ramping is the practice of increasing a variant's exposure in stages — say 1%, then 5%, 20%, 50% — pausing at each step to check guardrail metrics for harm. It separates risk control (the ramp) from measurement (the experiment). A ramp limits blast radius but the early, small stages are not powered to measure the effect precisely. This page explains the trade-off.
- Multi-armed bandit testing
A multi-armed bandit is an adaptive allocation strategy that sends more traffic to variants that look better as data accumulates, instead of a fixed split. It minimises 'regret' — lost conversions from showing inferior options — but the moving allocation complicates classic inference. Bandits suit ongoing optimisation more than one-off learning. This page explains the trade-off honestly.
- Contextual bandit optimisation
A contextual bandit extends the bandit idea by conditioning the choice of variant on context — features available at decision time, such as device or referrer. It learns a policy that maps context to the option likely to convert, allowing per-segment personalisation. This raises the same inference caveats as bandits plus risks around the context features used. This page covers both.
- Regression to the mean in tests
Regression to the mean is the statistical tendency for an extreme measurement to be closer to the average on the next observation. In experimentation it explains why a page picked because it converted unusually well often 'declines' afterward, and why early test readings overstate effects. Recognising it prevents crediting a change for a return to normal. This page explains the mechanism.
- Simpson’s paradox in experiments
Simpson's paradox is when an effect that holds within every subgroup reverses or vanishes once the subgroups are pooled. In experiments it appears when the mix of traffic differs between arms — so the aggregate is driven by composition, not the change. It is a vivid reason to check segments and to ensure arms are comparable. This page explains how it arises and how to avoid being fooled.
- Confounding variables in conversion
A confounding variable is a third factor that affects both the thing you changed and the outcome you measured, producing a spurious association. Confounders are why 'we shipped X and conversions rose' is weak evidence — a campaign, a season, or a price change could be the real cause. Randomised experiments neutralise confounders by design. This page explains the concept and the defence.
- Interaction effects between changes
An interaction effect occurs when the combined impact of two changes is not simply the sum of their individual impacts — one change alters how the other performs. Interactions matter when several experiments run on the same page at once, and they are the core reason multivariate testing exists. This page explains interactions and how concurrent tests can collide.
- Pitfalls of segmenting test results
Segmenting experiment results — by device, country, source — is useful, but slicing a non-significant test until some segment 'wins' is a recipe for false positives. Each extra segment is another comparison; enough slices guarantee a spurious hit. Legitimate segment analysis is pre-planned or corrected for multiplicity. This page separates honest segmentation from data dredging.
- Revenue per visitor (RPV)
Revenue per visitor (RPV) is total revenue divided by the number of visitors over a period. Because it combines conversion rate and average order value, it captures trade-offs a single metric hides — a change that lifts conversions but cuts order value may leave RPV flat. It is a common overall evaluation criterion in commerce experiments. This page defines RPV and its caveats.
- Value per visitor for non-purchases
Value per visitor generalises revenue per visitor to sites without direct sales: you assign an estimated value to each goal (a lead, a signup, a download) and divide total assigned value by visitors. It makes mixed conversion goals comparable, but the result is only as honest as the values you assign. This page explains the method and the disclosure it demands.
- Page speed and conversion
Loading speed influences whether visitors stay and convert, and Google's Core Web Vitals formalise field metrics for it (LCP, INP, CLS). The direction is well established, but the magnitude is specific to each site and audience — borrowed 'every 100ms costs X%' figures are not yours to cite. This page explains the measurable link and how to study it honestly.
- Pricing page optimisation
A pricing page sits at the decision point, so changes there move both conversion rate and order value. Optimising it means testing clarity, plan layout, and the unit you charge on — while judging results by revenue per visitor, since a layout that lifts signups onto cheaper plans can lower revenue. This page frames pricing tests honestly, with no invented benchmarks.
- Checkout flow optimisation
Checkout optimisation targets the final, highest-intent stretch of the funnel, where small friction loses ready buyers. The method is to instrument each step, find where drop-off concentrates, and test specific reductions — fewer fields, guest checkout, clearer errors. Success is read at the step that changed, not only the overall completion rate. This page frames it with step-level diagnosis.
- Mobile conversion gaps
Mobile and desktop frequently show different conversion rates, but a lower mobile number is not automatically a defect. The gap can be real friction (small targets, slow pages), different intent (browsing versus buying), or a measurement artefact (consent, tracking loss). Diagnosing which one applies is the work. This page lays out the causes and how to tell them apart.
- Trust signals and conversion
Trust signals are page elements that reduce a visitor's perceived risk: clear policies, security indicators, transparent contact details, and authentic social proof. They can lift conversion by easing hesitation, but the effect varies and must be tested, not assumed from someone else's numbers. Misused or fake signals backfire. This page covers what counts as a trust signal and how to test one.
- Copy and CTA testing
Copy and call-to-action (CTA) tests change words — a headline, a value proposition, button text — and measure the effect on conversion. The discipline is to isolate the copy change, and to judge it on the downstream macro conversion, not just the click, since punchier wording can raise clicks while lowering completions. This page frames honest copy testing.
- Accessibility and conversion
Accessibility — building pages usable by people with disabilities, per the W3C's WCAG — is also a conversion concern: a form a screen-reader user cannot complete is a lost conversion that analytics may never explain. Accessible design removes barriers and widens the convertible audience. This page connects WCAG practice to conversion, without inventing uplift figures, and notes it is educational, not legal advice.
- The winner’s curse in experiments
The winner's curse is the tendency for the measured effect of a 'winning' experiment to overstate the true effect, because selecting on statistical significance favours upward noise. It explains why shipped wins often underdeliver in production. Larger samples and replication shrink the inflation. This page explains the mechanism and how to set realistic expectations after a win.
- SaaS trial conversion
SaaS trial conversion measures how many trial sign-ups turn into paying subscriptions. It is the ratio of paid conversions to trials started over a window. The number depends on the trial model (opt-in vs opt-out, free trial vs reverse trial), the measurement window, and what counts as 'paid' — so the definition must travel with the metric.
- Freemium-to-paid conversion
Freemium-to-paid conversion is the fraction of free users who upgrade to a paid plan. Unlike a trial, freemium has no fixed expiry, so the denominator (all free users? active free users? a cohort?) and the upgrade trigger are choices that move the number. It tends to look low because the free base includes users who never intended to pay.
- Lead-gen funnel stages
A lead-generation funnel tracks the path from anonymous visitor to captured lead, to qualified lead, to sales opportunity, to closed deal. Each stage is a definition you set, and the hand-off points (marketing-qualified to sales-qualified) are where counts blur. Defining every stage as a concrete event keeps the funnel honest.
- B2B funnel stages
A B2B funnel differs from a consumer one: the buyer is a committee, the cycle runs weeks to months, and the unit is often an account rather than a person. Stages run from account engaged, to opportunity, to deal. Long lag and multi-person journeys mean point-in-time rates mislead, so account cohorts and multi-touch views are the honest reading.
- Product-qualified leads (PQLs)
A product-qualified lead (PQL) is a user who has shown buying intent through real product usage — hitting an activation milestone, reaching a usage limit, inviting teammates — rather than only filling in a form. PQLs sit between freemium usage and sales. Their value depends entirely on which behavioural signals you choose to define qualification.
- Ecommerce funnel stages
An ecommerce funnel tracks the standard path: product view → add to cart → begin checkout → add payment → purchase. Each step maps to a documented commerce event, which makes the funnel measurable end to end. The value is localising the biggest drop — usually between cart and checkout, or inside checkout — rather than reading one blended conversion rate.
- Conversion by traffic source
Conversion by traffic source breaks the overall conversion rate down by acquisition channel — organic search, paid, direct, referral, social, email. Different sources carry different intent, so a blended rate hides which channels convert. The reading is complicated by attribution: which touch gets credit determines which source a conversion lands against.
- Conversion by device type
Conversion by device type splits the rate across desktop, mobile, and tablet. A persistent mobile-vs-desktop gap is one of the most common findings in CRO, but it can be genuine friction (small forms, slow pages) or an artefact: mobile sessions skew toward research while desktop closes the purchase, and cross-device journeys split one buyer across devices.
- Conversion by new vs returning visitors
Conversion by new vs returning visitors splits the rate by whether someone is on their first visit or has been before. Returning visitors usually convert higher because they arrive further along in intent. The catch is that 'returning' depends on a stable identifier; cookie loss and privacy resets misclassify returners as new and depress the apparent returning rate.
- Internal site search and conversion
Internal site search is the on-site search box visitors use to find things. Searchers often behave differently from browsers — frequently with higher intent — so segmenting conversion by search use is revealing. Tracking search terms and especially zero-result queries surfaces unmet demand and navigation gaps that depress conversion.
- Personalization and conversion
Personalization shows different content to different visitors based on segment, behaviour, or context. It is often assumed to lift conversion, but assumption is not evidence: personalization adds complexity and can backfire, so it must be tested like any other change, against a holdout, on a metric chosen in advance.
- Social proof testing
Social proof presents signals that others trust you — reviews, ratings, usage counts, testimonials, badges — to reduce hesitation. Whether it lifts conversion is testable, not given. Critically, social proof must be truthful: fabricated reviews or invented counts are both an integrity failure and, in many jurisdictions, a consumer-protection violation.
- Urgency and scarcity testing
Urgency (a deadline) and scarcity (limited availability) cues aim to reduce hesitation and prompt action. Their effect is testable, but the cues must be genuine: countdown timers that reset and 'only 2 left' notices that are untrue are dark patterns and, in many jurisdictions, unlawful. Test real urgency; never manufacture fake pressure.
- Friction audit
A friction audit is a structured review of everything that makes converting harder than it needs to be — extra steps, confusing copy, slow pages, forced account creation, surprise costs, broken states. It inventories friction across the funnel so removal can be prioritised by impact, turning vague 'the site is clunky' into a ranked list of fixable obstacles.
- Form field analysis
Form field analysis breaks a form down field by field: which fields get completed, which trigger errors, which cause people to abandon, and how long each takes. It localises form friction to specific fields — often one problem field drives most abandonment — so you can shorten, reorder, or fix rather than redesigning blindly.
- Error message optimization
Error messages appear when a visitor's input fails validation. Vague, late, or harsh errors push people to abandon; clear, specific, well-timed ones recover them. Optimizing errors means making them say what is wrong and how to fix it, showing them inline near the field, and measuring error frequency so the worst offenders get attention.
- Checkout step reduction
Checkout step reduction means collapsing or removing stages in the purchase flow so the path from cart to confirmation is shorter. Each step is a chance to abandon, so fewer, cleaner steps often lift completion. But shorter is not automatically better: combining steps can overload a page, and some steps (review, fraud checks) earn their place — so changes must be tested.
- Guest checkout impact
Guest checkout lets a shopper complete a purchase without creating an account. Forcing account creation before purchase is a well-documented abandonment driver, because it inserts effort and a commitment between intent and payment. Offering guest checkout usually reduces that friction, but the trade-off against account benefits (repeat purchase, saved details) is worth measuring.
- One-click vs multistep checkout
One-click checkout completes a purchase using previously stored payment and shipping details, removing nearly all friction for returning buyers. Multistep checkout collects details across stages, giving more control and review. They serve different situations: one-click suits known repeat buyers, multistep suits first-time or high-consideration purchases. Neither is universally 'better'.
- Exit survey analysis
An exit survey asks visitors who are about to leave (or who just abandoned) why they did not convert. It supplies the 'why' that funnel numbers cannot. But responses are self-reported and self-selected — only some people answer, and stated reasons are not always the real cause — so exit-survey data generates hypotheses to test, not conclusions to act on blindly.
- Qualitative vs quantitative CRO
Conversion-rate optimization draws on two kinds of evidence. Quantitative methods (funnels, A/B tests, analytics) measure what is happening and how much. Qualitative methods (surveys, session review, interviews, usability tests) reveal why. Neither alone is enough: numbers locate the problem, qualitative work explains it, and experiments confirm the fix.
- Conversion debt
Conversion debt is the accumulated set of known conversion problems — friction, broken steps, untested assumptions, deferred fixes — that a team has chosen not to address. Like technical debt, it compounds: each unfixed leak keeps losing conversions every day, and shortcuts taken for speed accrue 'interest' until they are repaid. Naming it helps prioritise paying it down.
- Type I and type II errors
Every test can be wrong two ways. A type I error (false positive) declares a difference when none exists; its rate is the significance level α you choose. A type II error (false negative) misses a real difference; its rate is β, and 1−β is statistical power. Lowering one rate, holding sample size fixed, usually raises the other — the trade-off you manage when designing a test.
- Statistical power
Power is the probability that a test correctly rejects the null when a true effect of a stated size exists: power = 1 − β. It rises with sample size, with the size of the effect you want to catch, and with a looser significance threshold; it falls with higher metric variance. Underpowered tests waste traffic by failing to detect real wins, so power is planned before launch.
- Effect size
Effect size is the magnitude of a difference — for conversion, the absolute lift (e.g. 3.0% to 3.3% is +0.3 points) or the relative lift (+10%). It is distinct from significance: a p-value says whether an effect is plausibly non-zero, effect size says whether it is big enough to matter. The smaller the effect you want to catch, the more traffic you need, so effect size anchors test planning.
- CUPED variance reduction
CUPED (Controlled-experiment Using Pre-Experiment Data) reduces the variance of an experiment metric by adjusting it with a covariate measured before the test — typically each user's own pre-period behaviour. Because the covariate is independent of the treatment, the adjustment removes noise without introducing bias, so confidence intervals narrow and tests reach a decision with less traffic.
- Stratification in experiments
Stratification splits the population into subgroups (strata) such as device, country, or new-vs-returning, then randomises within each so every variant gets a balanced share of each stratum. This prevents chance imbalance on a known high-variance dimension and, when the stratifying variable predicts the outcome, lowers the variance of the overall effect estimate — a variance-reduction technique alongside CUPED.
- Multiple comparisons correction
When you run many tests at once — multiple variants, multiple metrics, many segments — the chance that at least one shows a false positive grows with the number of comparisons. Multiple-comparisons corrections counter this: the Bonferroni method controls the family-wise error rate by dividing α across tests, while the Benjamini-Hochberg procedure controls the false discovery rate, trading some power for fewer false 'wins'.
- One-tailed vs two-tailed tests
A two-tailed test asks whether the variant differs from control in either direction and splits α across both tails. A one-tailed test puts all of α on a single direction, so it is more sensitive to an effect that way — but blind to a move the other way, including the variant being worse. Because variants can hurt as well as help, two-tailed is the conservative default for conversion experiments.
- Switchback experiments
A switchback experiment randomises treatment at the level of time windows (and sometimes regions) rather than users: the entire system runs control for one interval, treatment for the next, alternating on a schedule. It is used where treating some users affects others — marketplaces, pricing, dispatch — so a user-level split would leak between arms. Time becomes the randomisation unit.
- Interleaving experiments
Interleaving compares two ranking algorithms by merging their results into a single list shown to the same user, then crediting whichever ranker contributed the items that were clicked. Because each user sees both rankers' picks side by side, within-user comparison removes between-user noise, making interleaving far more sensitive than splitting users between two whole rankings — widely documented for search and recommendation evaluation.
- Network effects in experiments
Standard A/B tests assume each user's outcome depends only on their own assigned variant — the no-interference (SUTVA) assumption. Network effects break it: in social products, marketplaces, or anything with sharing, a treated user changes the experience of untreated users, so control is 'contaminated' and the measured effect is biased. Cluster, switchback, or ego-network designs reduce the leakage.
- Randomization unit
The randomization unit is the thing you randomly assign to control or treatment: a user, a session, a device, a cookie, or a cluster. The choice must match how you analyse and how users experience the change. Mismatches cause two classic failures — a user flipping variants between sessions (inconsistent experience) and analysing at a finer grain than you assigned (understated variance, false significance).
- Delta method for ratio metrics
Many experiment metrics are ratios where the denominator is itself random — clicks per session, revenue per user, pages per visit. When the randomisation unit is coarser than the denominator unit, the numerator and denominator are correlated, so naive variance formulas are wrong. The delta method uses a first-order Taylor expansion to approximate the variance of the ratio correctly, fixing confidence intervals.
- Above-the-fold testing
'Above the fold' is the portion of a page visible without scrolling, which varies by viewport. Above-the-fold testing experiments with what occupies that first screen — headline, value proposition, primary CTA, hero media — because it sets first impressions. Measure it with scroll-depth and visibility events rather than assumptions, since the fold position differs across devices and the goal is the downstream conversion, not the click alone.
- Navigation testing
Navigation testing experiments with the menus, labels, and information architecture that route visitors to what they want — category names, menu grouping, header vs hamburger, breadcrumb presence. Because navigation touches every page, small changes have broad reach. It is evaluated with path analysis, click tracking on nav elements, and task-based usability research, with conversion and findability as the outcomes.
- Search relevance testing
Search relevance testing improves how an internal site search ranks results: query understanding, synonyms, ranking signals, and zero-result handling. It is measured with operational metrics (zero-result rate, click-through on results, search refinements) and outcome metrics (search-to-conversion). Ranking variants are compared with A/B tests on outcomes, or with interleaving for sensitive within-user comparison of rankers.
- Recommendation testing
Recommendation testing compares the algorithms that suggest products or content — related items, 'you may also like', personalised feeds. It is judged on engagement (recommendation click-through), attributed downstream conversion or revenue, and guardrails like diversity and coverage. A central pitfall is the feedback loop: a recommender shapes the very clicks used to train and evaluate it, so offline and online evaluation must be designed carefully.
- Trust badges and conversion
Trust badges are visual signals — security seals, recognised payment-network logos, certification marks — placed near sensitive steps to reduce perceived risk. Their effect is context-dependent and must be tested, not assumed: a badge that reassures one audience can clutter or even raise suspicion for another. Treat badge changes as ordinary A/B tests measured on completed conversion, with no presumed uplift.
- Shipping cost transparency
Unexpected extra costs — chiefly shipping, taxes and fees revealed only at the final step — are repeatedly documented as a leading reason for cart abandonment. Shipping cost transparency means surfacing those costs earlier (product page, cart, or a calculator) so the final total is no surprise. Test how and when you reveal cost, measuring checkout completion and not just cart adds.
- Returns policy and conversion
A returns policy lowers the perceived risk of buying something you cannot inspect in person. Its visibility (is it findable before checkout?) and its terms (window length, who pays return shipping, refund vs exchange) influence conversion. The trade-off is real: more generous terms can lift conversion but raise return costs, so test both sides and judge on net outcome, not conversion alone.
- Reviews and conversion
Customer reviews are a form of social proof: prospective buyers read others' experiences to reduce uncertainty. How reviews are surfaced — quantity, recency, the balance of positive and critical, and verified-purchase labelling — shapes their credibility and their effect on conversion. Display them honestly: fabricated or filtered reviews mislead users and breach consumer-protection rules. Measure effect with A/B tests, not assumed numbers.
- Live chat and conversion
Live chat (human or bot) lets visitors ask questions at the moment of doubt, potentially rescuing a conversion that hesitation would lose. But naive measurement overstates its value: people who choose to chat are often already higher-intent, so chatters convert more whether or not chat helped. Measure incremental effect with an experiment, and watch that proactive prompts do not distract or annoy.
- Popup timing
A popup or interstitial's effect depends heavily on when it fires: immediately on load, after a scroll or time threshold, on exit intent, or after a meaningful action. Early interruptions tend to annoy and can carry SEO penalties on mobile; later, context-aware triggers tend to convert better. Test triggers on net conversion, and respect interstitial guidelines and consent requirements.
- Sticky CTA testing
A sticky (fixed-position) CTA stays pinned to the viewport — a header bar, footer button, or floating button — so the primary action remains reachable however far the user scrolls. On long pages this can prevent users from losing the action; the cost is screen real estate, especially on mobile, and the risk of distraction. Test it on conversion with scroll and click data, balancing reach against clutter.
- Email capture optimization
Email capture is a micro-conversion: trading a clear value for a permissioned address. Optimising it spans the offer (what the user gets), the ask (form length and placement), the timing (when the prompt appears), and consent (lawful, unambiguous opt-in). Optimise for quality signups — engaged, consented subscribers — not raw volume, because addresses gathered by dark patterns churn and damage deliverability and compliance.
- Signup funnel optimization
The signup funnel is the sequence from intent to a created account — landing, form, verification, first authenticated state. Optimising it means instrumenting each step, finding where prospects drop, and removing friction (excess fields, unclear value, painful verification) without lowering the quality of accounts created. The goal is completed, activated signups, so it connects directly to the activation funnel that follows.
- Activation funnel
The activation funnel covers what happens after signup: the sequence of steps a new user takes to reach first meaningful value — the aha moment. Unlike the signup funnel (which ends at account creation), this one ends when the user has done the thing that makes the product useful. Mapping its steps and measuring completion at each reveals where new users stall before getting value, the strongest predictor of retention.
- Referral funnel
The referral funnel measures how existing users bring in new ones: being prompted to invite, sharing, the invitee clicking, the invitee signing up, and the invitee activating. Each stage has its own drop-off. Referral carries pitfalls that other funnels do not — two-sided incentives that can attract gaming, attribution of who gets credit, and network interference that complicates experiments measuring it.
- Upgrade funnel
The upgrade funnel is the path an existing free or lower-tier user takes to a paid or higher plan. Unlike acquisition funnels, it acts on people who already use the product, so the levers are different: hitting a usage limit, reaching a value moment that justifies paying, and contextual prompts at the point of need. Instrument the triggers and steps, and measure upgrades that retain, not just immediate conversions.
- Abandoned cart recovery
Abandoned cart recovery re-engages shoppers who added items but left before purchase, via reminder emails, a persistent saved cart, on-site nudges, or retargeting. The measurement trap is attribution: some abandoners would have returned anyway, so the honest metric is incremental recovery from a holdout-controlled comparison, not the total revenue 'attributed' to a recovery email. Consent governs the channels you may use.
- Multi-step form optimization
Multi-step forms break a long form into smaller screens, using progressive disclosure, a progress indicator, and logical chunking to reduce the intimidation of one giant form. The trade-off: each step is a fresh drop-off point and a page transition. Whether splitting helps is empirical — instrument completion per step, test against a single-page version, and let your own data decide rather than a blanket rule.
- Variance reduction overview
Variance reduction is a family of techniques that make an experiment more sensitive by lowering the variance of its effect estimate — narrowing confidence intervals so a true effect is detected with less traffic. Done correctly, it changes precision, not the expected effect, so it introduces no bias. The main methods — CUPED, stratification, and covariate adjustment — all exploit information unrelated to the treatment.
- Debugging a sample ratio mismatch
A sample ratio mismatch (SRM) — observed variant counts that differ from the intended split by more than chance — invalidates a test, because whatever broke the ratio likely biased the metrics too. Debugging SRM is a systematic hunt: check the assignment mechanism, redirect and timing effects, bot filtering, logging gaps, and analysis filters that drop one arm unevenly. This entry is the troubleshooting procedure, not the definition.
- Experiment instrumentation quality
Instrumentation quality is the often-ignored foundation of trustworthy experiments: if exposure, assignment, and metric events are logged wrongly, every downstream statistic is wrong too. This covers logging the exposure point correctly, deduplicating events, handling consent gaps, and validating tracking with A/A tests before a real experiment runs. Bad instrumentation produces confident, precise, and false conclusions.
- Trust signals hierarchy
Trust signals range from substantive (transparent pricing, clear policies, real reviews, secure connection) to decorative (generic badges, vague claims). They are not interchangeable: a believable review or an honest returns policy generally reassures more durably than a logo a user does not recognise. This entry frames how signals layer at points of risk, so teams invest in the ones that actually reduce hesitation.
- Progressive profiling
Progressive profiling minimises the initial ask — sometimes just an email — and collects additional information later, across subsequent visits or in context when it is actually needed. It trades a heavy upfront form for a lighter entry point, reducing signup friction, while still building the fuller profile over time. It pairs with data-minimisation: ask for a field only when there is a reason, and only with consent for its use.
- Win rate and experiment portfolio
Across mature experimentation programs, a large share of tests show no improvement — flat and negative results are the norm, not failure. Win rate (the fraction of tests that win) is a portfolio property to interpret, not a target to maximise: chasing a high win rate encourages timid tests and peeking. What matters is cumulative validated impact, balanced against learning from null and negative results.
Other reference hubs
- AI crawlers
- Search bots
- User agents
- Referrers
- UTM tracking
- Robots & crawl control
- Crawl diagnostics
- Geo traffic
- Analytics metrics
- Analytics dimensions
- Event tracking
- Attribution models
- Privacy & compliance
- Data quality
- Analytics platforms
- Reports & dashboards
See how WebmasterID applies this in product: Bot intelligence, AI referrals, and AI visibility analytics.