Language spam and keyword spam
Language spam and keyword spam place messages — promotions, slogans, even instructions — into fields like browser language or a site-search term. The values are forged, sent by bots or crafted hits to be read by whoever opens the report. They are not real visitor attributes. This page explains how the injection works and how to filter and recognise it.
How forged values reach your reports
Dimensions like browser language or on-site search term are populated from values the client supplies. A spammer can set those values to anything, then send hits — sometimes directly to a measurement endpoint, sometimes via a crawler — so the message appears in your reports. The goal is purely to be seen by whoever reads the analytics; there is no real visitor behind it.
Because the values are attacker-controlled, they can be anything from a promo URL to a fake instruction, and they cluster with zero engagement.
- Client-supplied fields can be set to any value
- Hits exist only to display text to the report reader
- No real visit or engagement behind the value
Filtering and recognising it
Treat any field value that reads like a message rather than a real attribute as spam. Filter by valid hostname to drop hits that never loaded your site, exclude the offending values, and apply bot/spam filtering at the source. Server-side classification removes most of the category before it reaches a human report, since the endpoint is not a public target to spoof into.
How it appears in analytics and logs
A 'language' or search term that reads like a sentence or advertisement is injected spam, not a genuine visitor attribute.
Diagnostic use case
Recognise forged values in language, search-term, and similar fields as spam aimed at the report reader, and exclude them from analysis.
What WebmasterID can help detect
WebmasterID classifies traffic server-side at ingest, so hits crafted purely to inject text into report fields are separated from human analytics.
Common mistakes
- Acting on a 'search term' that is actually injected spam.
- Filtering individual phrases instead of validating hostname.
- Mistaking forged language values for a real audience segment.
Privacy and accuracy notes
Filtering this spam matches field values and request patterns, not visitor identity. No personal data is needed to exclude it.
Related pages
- Referral spam and ghost traffic
Referral spam and ghost traffic are fake hits crafted to appear in your reports. Crawler spam loads pages to leave a referrer in your logs; ghost spam sends hits straight to a measurement endpoint without ever visiting your site. Both add phantom sessions with no engagement. This page explains the mechanics and the filtering that removes them.
- Bot traffic in analytics: filtering it out
Bots — crawlers, scrapers, monitors, scanners — generate requests that, unfiltered, inflate pageviews and distort every metric. Client-side analytics often misses bots (many do not run JavaScript) or miscounts the ones that do. Server-side classification at ingest is the reliable way to keep bot traffic out of human reports.
- An analytics data-validation checklist
Before you act on a report, validate the data that produced it. This checklist walks the recurring failure points — duplicate tags, unfiltered bots, internal traffic, wrong time zone, broken events, sampling — and gives a concrete check for each. Run it after any tracking change and periodically, so a metric you trust is a metric you have verified.
- Bot intelligence
Crafted, injected hits separated from human data.
Sources and verification notes
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.