Data quality

Idempotency and dedup keys

Distributed pipelines deliver at least once, so the same event can arrive twice from retries, replays, or backfills. An idempotency key — a stable, unique identifier per event — lets the pipeline recognize a repeat and keep exactly one copy, so re-processing does not inflate counts. This page explains idempotency and de-duplication keys and how to choose one that survives the whole pipeline.

Partially verified

Why repeats happen

Most messaging and ingestion systems guarantee at-least-once delivery: to avoid losing data on failure, they re-send when an acknowledgment is missed, so duplicates are normal, not exceptional. Retries, consumer restarts, and backfills all replay events. Without a way to recognize a repeat, each copy is counted, and totals drift upward over time.

Exactly-once results are achieved at the application layer by de-duplicating on a key, not by assuming the transport never repeats.

At-least-once delivery means duplicates are expected
Retries, restarts, and backfills replay events
Without a key, every copy is counted

Choosing a key that survives

Pick a key that is unique per logical event and stable end-to-end — ideally assigned at the source (an event id), not derived downstream where a transform might change it. De-duplicate against it at the point of load, within a window wide enough to cover the maximum delivery delay. For commerce, a transaction id plays this role for orders. Keep the key opaque so it carries no personal data.

This is the mechanism backfills and dead-letter replays rely on to stay safe.

How it appears in analytics and logs

Counts that inflate after a retry or replay mean events lack a dedup key, so the pipeline treated repeats as new.

Diagnostic use case

Keep retries, replays, and backfills from double-counting by de-duplicating on a stable per-event key the whole pipeline honors.

What WebmasterID can help detect

WebmasterID can de-duplicate first-party events on a stable id so retried or replayed deliveries are counted once.

Common mistakes

Assuming the transport delivers exactly once.
Deriving a dedup key downstream where it can change.
De-duplicating within a window narrower than the delivery delay.

Privacy and accuracy notes

A dedup key should be an opaque event id, not a personal identifier. This page is educational, not legal advice.

↑ All data-quality topics in Data quality

Sources and verification notes

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.