WebmasterID logoWebmasterID
AI crawlers

Do AI crawlers obey robots.txt?

Major declared AI crawlers such as GPTBot, ClaudeBot, and Google-Extended document that they honour robots.txt, but compliance is voluntary and varies across operators. robots.txt is a crawl request defined by a shared standard, not an access-control mechanism, so a non-compliant or undeclared scraper can ignore it. Enforcement requires server-side controls.

Verified against primary sources

robots.txt is a request, not a lock

The Robots Exclusion Protocol lets a site ask crawlers not to fetch certain paths. It is honoured by compliant crawlers as a matter of policy, but nothing in the protocol enforces it. A client that chooses to ignore robots.txt faces no technical barrier from the file itself.

That is why robots.txt is described as a request. It is effective against well-behaved crawlers and useless against ones that disregard it, so it should never be your only line of defence for content you must protect.

Which AI crawlers honour it

The major, declared AI crawlers publish that they honour robots.txt: OpenAI documents it for GPTBot, ChatGPT-User, and OAI-SearchBot; Anthropic for ClaudeBot and Claude-User; Google for Google-Extended; Common Crawl for CCBot. For these, a correctly targeted Disallow is respected after the crawler re-reads your robots.txt.

Compliance is per operator, though. Some crawlers have limited or unclear documentation, and undeclared scrapers that copy a user-agent string answer to no published policy. Treat the documented majors as compliant and everything else as unverified until proven otherwise.

When you need enforcement

For content where access truly must be restricted, robots.txt is insufficient. Use server-side controls: authentication, IP/source verification, rate limits, or WAF rules that return 401/403 to clients you do not allow. These apply regardless of whether the client reads robots.txt.

Keep robots.txt for steering compliant crawlers and signalling intent, and layer enforcement underneath. The two work together: the file expresses policy, the server enforces it.

How it appears in analytics and logs

If a declared AI token keeps fetching a path you disallowed, either the rule does not match the token, propagation is lagging, or the crawler is not honouring the directive. Continued access by an undeclared scraper means robots.txt was never going to stop it.

Diagnostic use case

Set realistic expectations for robots.txt: rely on it to steer compliant AI crawlers, and add server-side controls where you need actual enforcement.

What WebmasterID can help detect

WebmasterID records which AI tokens fetched which paths and when, so you can check whether a crawler actually respected a Disallow after it took effect, on the bot-intelligence surface.

Common mistakes

Privacy and accuracy notes

robots.txt governs crawl behaviour, not visitor identity. Evaluating compliance uses crawler tokens and request paths only; no human data is involved.

Frequently asked questions

Will robots.txt stop an undeclared scraper?
No. robots.txt is honoured voluntarily by compliant crawlers. An undeclared scraper, or one that copies a known user-agent string, can ignore it. To actually stop such traffic you need server-side enforcement.

Related pages

Sources and verification notes

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.