Bot vs browser user agents: how to tell them apart
A user-agent string is a self-reported label, not an identity. This page explains how declared bots name themselves, why almost every UA still starts with the legacy Mozilla token, and how to read the difference between an automated client and a real browser without over-trusting the string.
What a user-agent string is
The User-Agent is an HTTP request header the client sends to describe itself. It is defined by the HTTP specification, but its contents are entirely under the client's control. Anything — a browser, a crawler, a script, a monitoring tool — can send any string it likes.
That is the key fact: the user agent is a claim, not a credential.
How declared bots identify themselves
Well-behaved crawlers name themselves clearly, usually with a product token and a URL pointing at their documentation (for example a token plus a +https://… link). This is how you recognise the major search and AI crawlers in logs. The token is the stable part; version numbers change.
- Look for a clear product token (Googlebot, bingbot, GPTBot, …)
- A self-identifying URL in the string signals a declared crawler
- Match on the token, not the full version string
Why browsers all start with 'Mozilla/5.0'
Almost every browser user agent begins with the legacy Mozilla/5.0 token for historical compatibility reasons. It tells you nothing useful on its own — it does not mean Firefox. Read the rest of the string (engine, platform, browser tokens) and remember that a scraper can reproduce all of it.
How it appears in analytics and logs
A user agent that names a crawler and links to its documentation is a declared bot. A string that mimics a full browser may be a real browser or a scraper copying one — the string alone cannot tell you which.
Diagnostic use case
Decide whether a request is automated or a real browser when triaging traffic, and know when the user agent is enough versus when you need verification.
What WebmasterID can help detect
WebmasterID parses the user agent server-side into a category (search bot, AI crawler, automation, browser) against a maintained signature list, and leaves unknown clients in an honest 'other' bucket rather than guessing.
Common mistakes
- Treating the user-agent string as proof of identity.
- Blocking on a substring match and accidentally catching legitimate clients.
- Storing raw user-agent strings of real visitors when a category would do.
Privacy and accuracy notes
User-agent parsing for classification is privacy-safe when you store a category rather than the raw string of real visitors. WebmasterID classifies at ingest and does not expose raw user-agent strings from human traffic.
Related pages
- Spoofed and fake user agents: what to watch for
Spoofing a user agent is trivial — any client can claim to be Googlebot or a normal browser. This page explains why spoofing happens, the common fake-crawler patterns, and the verification methods that turn a claimed identity into a confirmed one.
- GPTBot — OpenAI's web crawler
GPTBot is the crawler OpenAI uses to fetch publicly available web content that may be used to help train its foundation models. It is a declared, well-documented crawler with a stable robots.txt token, and OpenAI publishes both documentation and an IP range list so operators can identify and control it.
- Googlebot Smartphone — Google's mobile-first crawler
Googlebot Smartphone is the mobile user-agent variant of Googlebot and, under mobile-first indexing, Google's primary crawler for most sites. It uses the Googlebot robots.txt token and can be verified through reverse DNS and Google's published crawler IP ranges.
- Bot vs human traffic
The bigger picture on separating automation from people.
Sources and verification notes
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.