Data quality

ETL and pipeline failures

Analytics often flows through an ETL/ELT pipeline that extracts events, transforms them, and loads them into reporting tables. A failure at any stage — a timed-out extract, a transform exception, a half-written load — leaves data partial or stale, and if the failure is silent it reads as a genuine traffic dip. This page explains ETL failure modes and how to tell a pipeline gap from a real one.

Partially verified

Where pipelines fail

Extraction can fail when a source API rate-limits, times out, or returns partial pages, so fewer events are pulled than exist. Transformation can throw on an unexpected value, drop rows, or mis-map a field after an upstream schema change. Loading can half-complete — a job that dies mid-write leaves a table with some of the day's data and no error on the dashboard.

The dangerous case is the silent partial: the pipeline 'succeeds' with incomplete data, so reports look plausible but understate reality.

Extract: rate limits, timeouts, partial pages
Transform: exceptions, dropped rows, mis-mapped fields
Load: half-written tables with no surfaced error

Catching it

Make pipelines fail loudly: alert on job errors, but also on absence — a table that did not update, a row count far below the trailing norm, or a freshness timestamp that stopped advancing. Reconcile loaded totals against the source count so a partial load is caught by the gap, not by a user noticing. When a failure is found, reprocess the affected window rather than leaving the hole.

This is the operational counterpart to schema and contract issues, which often trigger the transform exceptions in the first place.

How it appears in analytics and logs

A sharp dip that ends exactly at a job boundary, or a table that stopped updating, points to an ETL failure rather than lost audience.

Diagnostic use case

Distinguish a real traffic change from a pipeline failure by checking extract, transform, and load health before interpreting a dip.

What WebmasterID can help detect

WebmasterID's independent first-party collection gives a second source to confirm whether a dip is real or a pipeline artifact.

Common mistakes

Treating a job that 'succeeded' as proof data is complete.
Alerting on errors but not on missing or stale data.
Reading a pipeline-caused dip as audience loss.

Privacy and accuracy notes

Pipeline health is operational and touches no visitor identity directly. This page is educational, not legal advice.

↑ All data-quality topics in Data quality

Sources and verification notes

Google Cloud — Reliable data pipelines (best practices)

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.