Writing an AI crawler policy for robots.txt
An AI crawler policy is a deliberate decision about which AI-related tokens you allow and which you disallow in robots.txt. This page offers a structured way to make and document those choices, while staying realistic: robots.txt is a request to compliant crawlers, not a legal or technical guarantee.
Deciding per token
There is no single right answer; the trade-off is visibility versus control. Allowing an AI crawler can help your content be represented in that company's products; disallowing it asks them not to use your site for the purposes that token governs. Decide token by token rather than with one blanket switch, because tokens mean different things — a training crawler, a real-time user fetch, and a search crawler are distinct decisions.
Group your choices: search crawlers you almost always keep open; training crawlers and AI-use tokens such as Google-Extended, GPTBot, ClaudeBot, CCBot are where most policy decisions live.
- Separate search, training, and real-time-fetch tokens
- Decide per token, not with one blanket rule
- Revisit as new tokens are published
Document the rationale, do not overclaim
Write down why you allow or disallow each token, so future maintainers understand the intent and you can revisit it as the landscape changes. Keep the language honest: robots.txt expresses a preference that compliant crawlers honour. It is not a contract, not a copyright enforcement mechanism, and not a technical block against non-compliant clients.
For content you must protect, combine policy with authentication and, where appropriate, terms of service — but do not claim robots.txt alone provides legal protection.
How it appears in analytics and logs
Your policy is only as effective as the crawlers' compliance. Observing which AI tokens still appear after you disallow them tells you which honour your rules and which to escalate.
Diagnostic use case
Build a defensible, documented robots.txt policy for AI crawlers — deciding per token whether visibility or opting out matters more for your site.
What WebmasterID can help detect
WebmasterID classifies AI crawlers server-side and shows their activity per page, so you can verify that your allow/deny choices match what compliant crawlers actually do.
Common mistakes
- Blanket-blocking every AI token without separating training, fetch, and search.
- Claiming robots.txt legally prevents AI use — it expresses a request.
- Forgetting to document why each token was allowed or disallowed.
Privacy and accuracy notes
An AI crawler policy is a content-usage stance in a public file. It involves no visitor data and should not overclaim legal enforcement.
Related pages
- How to opt out of Google AI with Google-Extended
Google-Extended is a robots.txt user-agent token Google provides so site owners can opt out of having their content used for certain Google AI products. Crucially, it is a standalone control: disallowing Google-Extended does not affect Googlebot crawling or your appearance in Google Search.
- How to block GPTBot in robots.txt
If you do not want OpenAI's training crawler fetching your site, you can disallow GPTBot in robots.txt. This page gives the exact rule, clarifies that it does not affect ChatGPT-User or OAI-SearchBot, and is honest about the limits of robots-based blocking.
- AI visibility analytics
Verify which AI crawlers honour your policy, page by page.
Sources and verification notes
- Google — Overview of Google crawlers and AI controls
- RFC 9309 — Robots Exclusion Protocolrobots.txt is a request to compliant crawlers, not enforcement.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.