Do AI crawlers obey robots.txt?
Major declared AI crawlers such as GPTBot, ClaudeBot, and Google-Extended document that they honour robots.txt, but compliance is voluntary and varies across operators. robots.txt is a crawl request defined by a shared standard, not an access-control mechanism, so a non-compliant or undeclared scraper can ignore it. Enforcement requires server-side controls.
robots.txt is a request, not a lock
The Robots Exclusion Protocol lets a site ask crawlers not to fetch certain paths. It is honoured by compliant crawlers as a matter of policy, but nothing in the protocol enforces it. A client that chooses to ignore robots.txt faces no technical barrier from the file itself.
That is why robots.txt is described as a request. It is effective against well-behaved crawlers and useless against ones that disregard it, so it should never be your only line of defence for content you must protect.
Which AI crawlers honour it
The major, declared AI crawlers publish that they honour robots.txt: OpenAI documents it for GPTBot, ChatGPT-User, and OAI-SearchBot; Anthropic for ClaudeBot and Claude-User; Google for Google-Extended; Common Crawl for CCBot. For these, a correctly targeted Disallow is respected after the crawler re-reads your robots.txt.
Compliance is per operator, though. Some crawlers have limited or unclear documentation, and undeclared scrapers that copy a user-agent string answer to no published policy. Treat the documented majors as compliant and everything else as unverified until proven otherwise.
- GPTBot, ClaudeBot, Google-Extended, CCBot document robots.txt compliance
- Compliance is voluntary and varies by operator
- Undeclared scrapers may ignore robots.txt entirely
When you need enforcement
For content where access truly must be restricted, robots.txt is insufficient. Use server-side controls: authentication, IP/source verification, rate limits, or WAF rules that return 401/403 to clients you do not allow. These apply regardless of whether the client reads robots.txt.
Keep robots.txt for steering compliant crawlers and signalling intent, and layer enforcement underneath. The two work together: the file expresses policy, the server enforces it.
How it appears in analytics and logs
If a declared AI token keeps fetching a path you disallowed, either the rule does not match the token, propagation is lagging, or the crawler is not honouring the directive. Continued access by an undeclared scraper means robots.txt was never going to stop it.
Diagnostic use case
Set realistic expectations for robots.txt: rely on it to steer compliant AI crawlers, and add server-side controls where you need actual enforcement.
What WebmasterID can help detect
WebmasterID records which AI tokens fetched which paths and when, so you can check whether a crawler actually respected a Disallow after it took effect, on the bot-intelligence surface.
Common mistakes
- Treating robots.txt as a security boundary instead of a crawl request.
- Assuming every AI crawler honours it equally — compliance varies.
- Expecting an instant effect; crawlers act after they re-read robots.txt.
- Mis-targeting the token so the Disallow never matches the crawler.
Privacy and accuracy notes
robots.txt governs crawl behaviour, not visitor identity. Evaluating compliance uses crawler tokens and request paths only; no human data is involved.
Frequently asked questions
- Will robots.txt stop an undeclared scraper?
- No. robots.txt is honoured voluntarily by compliant crawlers. An undeclared scraper, or one that copies a known user-agent string, can ignore it. To actually stop such traffic you need server-side enforcement.
Related pages
- AI crawlers and paywalled content
AI crawlers can only ingest what your server returns to them. For paywalled or metered content, that depends on whether the page is gated by hard access control or by a soft, client-side wall. robots.txt asks compliant crawlers to stay out; only real authentication or server-side gating actually prevents an AI crawler from reading the full text.
- Undeclared AI scrapers and how they appear
Some AI scrapers do not declare a recognisable token. They appear with generic user agents, browser-like strings, or forged identities. They cannot be identified by a clean token, so the honest approach is to describe the pattern, verify what you can, and categorise conservatively.
- Web crawlers reference
Reference for crawler tokens and how robots.txt directives apply to them.
Sources and verification notes
- Google — robots.txt specificationDefines robots.txt as a crawl request, not an enforcement mechanism.
- OpenAI — bots documentationDocuments that OpenAI's crawlers honour robots.txt.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.