Idempotency and dedup keys
Distributed pipelines deliver at least once, so the same event can arrive twice from retries, replays, or backfills. An idempotency key — a stable, unique identifier per event — lets the pipeline recognize a repeat and keep exactly one copy, so re-processing does not inflate counts. This page explains idempotency and de-duplication keys and how to choose one that survives the whole pipeline.
Why repeats happen
Most messaging and ingestion systems guarantee at-least-once delivery: to avoid losing data on failure, they re-send when an acknowledgment is missed, so duplicates are normal, not exceptional. Retries, consumer restarts, and backfills all replay events. Without a way to recognize a repeat, each copy is counted, and totals drift upward over time.
Exactly-once results are achieved at the application layer by de-duplicating on a key, not by assuming the transport never repeats.
- At-least-once delivery means duplicates are expected
- Retries, restarts, and backfills replay events
- Without a key, every copy is counted
Choosing a key that survives
Pick a key that is unique per logical event and stable end-to-end — ideally assigned at the source (an event id), not derived downstream where a transform might change it. De-duplicate against it at the point of load, within a window wide enough to cover the maximum delivery delay. For commerce, a transaction id plays this role for orders. Keep the key opaque so it carries no personal data.
This is the mechanism backfills and dead-letter replays rely on to stay safe.
How it appears in analytics and logs
Counts that inflate after a retry or replay mean events lack a dedup key, so the pipeline treated repeats as new.
Diagnostic use case
Keep retries, replays, and backfills from double-counting by de-duplicating on a stable per-event key the whole pipeline honors.
What WebmasterID can help detect
WebmasterID can de-duplicate first-party events on a stable id so retried or replayed deliveries are counted once.
Common mistakes
- Assuming the transport delivers exactly once.
- Deriving a dedup key downstream where it can change.
- De-duplicating within a window narrower than the delivery delay.
Privacy and accuracy notes
A dedup key should be an opaque event id, not a personal identifier. This page is educational, not legal advice.
Related pages
- Server-side deduplication
Server-side tagging and the Measurement Protocol let the server emit events alongside the browser. If a conversion fires from both the client tag and the server without coordination, it is counted twice. Deduplication on a shared event identifier prevents this, mirroring how ad platforms dedupe browser and server signals. This page explains the dual-send problem and the id-based dedup that solves it.
- Transaction ID deduplication
Ecommerce purchases are deduplicated on transaction_id. If a confirmation page reloads or a user refreshes, the same purchase event can fire twice; GA4 collapses repeated transaction_ids so revenue is not double-counted. The flip side: a missing transaction_id, or one reused across different orders, breaks dedup and corrupts revenue. This page explains the mechanism and its failure modes.
- Backfill and reprocessing
When a pipeline misses data or processes it with a bug, backfilling re-runs it over the affected window to correct the record. Done carelessly, a backfill appends rows on top of existing ones and double-counts, or it overwrites good data with a still-buggy transform. This page explains how to reprocess a window safely so corrections fix the gap instead of creating a new one.
- Events documentation
Assign a stable event id for de-duplication.
Sources and verification notes
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.