AI crawler user-agent spoofing
Any client can put GPTBot or ClaudeBot in its User-Agent header, because that header is supplied by the client and never validated by HTTP. Spoofers do this to borrow a trusted crawler's reputation or to get around rules. The defence is verifying the request's network source against the operator's published ranges, not trusting the string.
Why the user agent is not proof
The User-Agent header is set by the client and HTTP does nothing to validate it. So any script, scraper, or browser can send User-Agent: GPTBot. The string is a claim of identity, not evidence of it.
Spoofers do this for a few reasons: to inherit allow rules written for trusted crawlers, to dodge bot defences that whitelist known tokens, or simply to obscure what they are. None of these are the real operator.
How to verify the real crawler
Genuine major crawlers publish a way to confirm their requests. OpenAI publishes GPTBot IP ranges; others publish ranges or support reverse-DNS verification where a PTR record resolves into the operator's domain and a forward lookup confirms it. Verify the source against these published facts before trusting a token.
Never fabricate IP ranges to verify a crawler. Use the operator's current published list, and where no verification method exists, treat the token as unverified rather than assuming it is genuine.
- User-Agent is client-supplied and never validated by HTTP
- Verify source IP against the operator's published ranges
- Where supported, confirm via reverse-then-forward DNS
What to do with impostors
Once a request is shown to be spoofing a known token, exclude it from crawler coverage and AI-visibility stats so your numbers reflect only genuine crawlers. Decide separately whether to challenge, rate-limit, or block it — a spoofed GPTBot is, by definition, not GPTBot and is not bound by OpenAI's policies.
Keep allow rules keyed on verified identity, not on the string. An allow rule that trusts User-Agent alone is an open door for anyone willing to type the token.
How it appears in analytics and logs
A request whose UA names a known AI crawler but whose source falls outside that operator's published ranges is a spoof. Treating it as the real crawler would overstate coverage and could let abusive traffic inherit an allow rule meant for the genuine bot.
Diagnostic use case
Tell genuine AI crawlers from impostors so allow-rules, coverage stats, and analytics are not polluted by traffic merely claiming a trusted token.
What WebmasterID can help detect
WebmasterID verifies AI crawlers server-side rather than trusting the user agent, so spoofed requests are not counted as genuine crawler activity on the bot-intelligence surface.
Common mistakes
- Whitelisting a crawler by user-agent string, which any client can copy.
- Counting spoofed requests as genuine crawler coverage.
- Inventing IP ranges instead of using the operator's published list.
- Assuming a token with no published verification method is trustworthy.
Privacy and accuracy notes
Spoof detection compares the crawler token against the operator's published network source. It uses no visitor identity and stores no raw client identifiers as a feature; a crawler is not a person.
Frequently asked questions
- Can someone fake the GPTBot user agent?
- Yes. The User-Agent header is client-supplied, so any client can claim to be GPTBot. The only reliable check is verifying the request's source IP against OpenAI's published GPTBot ranges; a match outside those ranges is a spoof.
Related pages
- Verifying AI crawlers
Any client can copy a user-agent string, so a token alone is a claim, not proof. Some vendors, such as OpenAI for GPTBot, publish IP ranges or verification guidance; many do not. Verify before trusting, and never invent IP ranges to fill the gap.
- Undeclared AI scrapers and how they appear
Some AI scrapers do not declare a recognisable token. They appear with generic user agents, browser-like strings, or forged identities. They cannot be identified by a clean token, so the honest approach is to describe the pattern, verify what you can, and categorise conservatively.
- Bot intelligence
Server-side crawler verification that does not trust the user-agent string.
Sources and verification notes
- MDN — User-Agent headerUser-Agent is set by the client and not validated by the protocol.
- OpenAI — GPTBot documentationPublishes GPTBot IP ranges for verifying genuine requests.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.