Verifying AI crawlers
Any client can copy a user-agent string, so a token alone is a claim, not proof. Some vendors, such as OpenAI for GPTBot, publish IP ranges or verification guidance; many do not. Verify before trusting, and never invent IP ranges to fill the gap.
Why a token is only a claim
A user-agent string is set by the client, so anyone can send a request that says GPTBot or ClaudeBot. The robots.txt token identifies what a compliant crawler calls itself; it does not, on its own, prove that a given request truly came from that vendor.
That is why verification matters for any decision that depends on the requester being genuine. Treat the token as a starting hypothesis to be confirmed, not as proof.
How verification actually works
The reliable signal is the source of the request matched against vendor-published information. OpenAI, for example, publishes IP ranges for its crawlers so operators can confirm a request claiming to be GPTBot really originates from OpenAI. Some vendors publish reverse-DNS or other guidance instead.
The gap is that many AI crawlers publish no verification material at all. For those, you can identify by token but cannot fully verify, so treat trust-sensitive decisions conservatively. Critically, never invent IP ranges or fabricate a verification method — an unverifiable crawler stays unverifiable, and the honest classification is partial.
- Some vendors publish IP ranges (e.g. OpenAI for GPTBot)
- Many publish nothing verifiable — identify by token only
- Never invent IP ranges to manufacture certainty
How it appears in analytics and logs
A token in a user agent is a claim. Genuine verification depends on matching the source against vendor-published ranges or guidance. Where a vendor publishes none, the token cannot be fully verified and should be treated cautiously.
Diagnostic use case
Verify that a request claiming to be a given AI crawler is genuine before acting on it, using vendor-published IP ranges or guidance where available.
What WebmasterID can help detect
WebmasterID classifies crawlers server-side and can flag requests whose claimed identity does not hold up, helping you separate genuine AI crawlers from clients merely wearing their user-agent token.
Common mistakes
- Trusting a user-agent token without any source verification.
- Inventing IP ranges or reverse-DNS rules for crawlers that publish none.
- Treating an unverifiable crawler as fully confirmed.
Privacy and accuracy notes
Verification uses request metadata and vendor-published ranges, not visitor identity. This entry avoids printing raw addresses. WebmasterID records crawls as bot events and never as visitor profiles.
Related pages
- Undeclared AI scrapers and how they appear
Some AI scrapers do not declare a recognisable token. They appear with generic user agents, browser-like strings, or forged identities. They cannot be identified by a clean token, so the honest approach is to describe the pattern, verify what you can, and categorise conservatively.
- GPTBot — OpenAI's web crawler
GPTBot is the crawler OpenAI uses to fetch publicly available web content that may be used to help train its foundation models. It is a declared, well-documented crawler with a stable robots.txt token, and OpenAI publishes both documentation and an IP range list so operators can identify and control it.
- Bot intelligence
Deterministic categorisation of crawlers, search bots, and automation.
Sources and verification notes
- OpenAI — bots documentation (IP ranges)OpenAI publishes IP ranges for verification; most other vendors do not, so this topic is partially verifiable.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.