Data quality

Language spam and keyword spam

Language spam and keyword spam place messages — promotions, slogans, even instructions — into fields like browser language or a site-search term. The values are forged, sent by bots or crafted hits to be read by whoever opens the report. They are not real visitor attributes. This page explains how the injection works and how to filter and recognise it.

Verified against primary sources

How forged values reach your reports

Dimensions like browser language or on-site search term are populated from values the client supplies. A spammer can set those values to anything, then send hits — sometimes directly to a measurement endpoint, sometimes via a crawler — so the message appears in your reports. The goal is purely to be seen by whoever reads the analytics; there is no real visitor behind it.

Because the values are attacker-controlled, they can be anything from a promo URL to a fake instruction, and they cluster with zero engagement.

Client-supplied fields can be set to any value
Hits exist only to display text to the report reader
No real visit or engagement behind the value

Filtering and recognising it

Treat any field value that reads like a message rather than a real attribute as spam. Filter by valid hostname to drop hits that never loaded your site, exclude the offending values, and apply bot/spam filtering at the source. Server-side classification removes most of the category before it reaches a human report, since the endpoint is not a public target to spoof into.

How it appears in analytics and logs

A 'language' or search term that reads like a sentence or advertisement is injected spam, not a genuine visitor attribute.

Diagnostic use case

Recognise forged values in language, search-term, and similar fields as spam aimed at the report reader, and exclude them from analysis.

What WebmasterID can help detect

WebmasterID classifies traffic server-side at ingest, so hits crafted purely to inject text into report fields are separated from human analytics.

Common mistakes

Acting on a 'search term' that is actually injected spam.
Filtering individual phrases instead of validating hostname.
Mistaking forged language values for a real audience segment.

Privacy and accuracy notes

Filtering this spam matches field values and request patterns, not visitor identity. No personal data is needed to exclude it.

↑ All data-quality topics in Data quality

Sources and verification notes

Google — [GA4] Spam and unwanted traffic referrals

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.