AI crawlers

Verifying AI crawlers

Any client can copy a user-agent string, so a token alone is a claim, not proof. Some vendors, such as OpenAI for GPTBot, publish IP ranges or verification guidance; many do not. Verify before trusting, and never invent IP ranges to fill the gap.

Partially verified

Why a token is only a claim

A user-agent string is set by the client, so anyone can send a request that says GPTBot or ClaudeBot. The robots.txt token identifies what a compliant crawler calls itself; it does not, on its own, prove that a given request truly came from that vendor.

That is why verification matters for any decision that depends on the requester being genuine. Treat the token as a starting hypothesis to be confirmed, not as proof.

How verification actually works

The reliable signal is the source of the request matched against vendor-published information. OpenAI, for example, publishes IP ranges for its crawlers so operators can confirm a request claiming to be GPTBot really originates from OpenAI. Some vendors publish reverse-DNS or other guidance instead.

The gap is that many AI crawlers publish no verification material at all. For those, you can identify by token but cannot fully verify, so treat trust-sensitive decisions conservatively. Critically, never invent IP ranges or fabricate a verification method — an unverifiable crawler stays unverifiable, and the honest classification is partial.

Some vendors publish IP ranges (e.g. OpenAI for GPTBot)
Many publish nothing verifiable — identify by token only
Never invent IP ranges to manufacture certainty

How it appears in analytics and logs

A token in a user agent is a claim. Genuine verification depends on matching the source against vendor-published ranges or guidance. Where a vendor publishes none, the token cannot be fully verified and should be treated cautiously.

Diagnostic use case

Verify that a request claiming to be a given AI crawler is genuine before acting on it, using vendor-published IP ranges or guidance where available.

What WebmasterID can help detect

WebmasterID classifies crawlers server-side and can flag requests whose claimed identity does not hold up, helping you separate genuine AI crawlers from clients merely wearing their user-agent token.

Common mistakes

Trusting a user-agent token without any source verification.
Inventing IP ranges or reverse-DNS rules for crawlers that publish none.
Treating an unverifiable crawler as fully confirmed.

Privacy and accuracy notes

Verification uses request metadata and vendor-published ranges, not visitor identity. This entry avoids printing raw addresses. WebmasterID records crawls as bot events and never as visitor profiles.

↑ All AI crawlers in AI crawlers

Sources and verification notes

OpenAI — bots documentation (IP ranges)OpenAI publishes IP ranges for verification; most other vendors do not, so this topic is partially verifiable.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.