Robots & crawl control

Writing an AI crawler policy for robots.txt

An AI crawler policy is a deliberate decision about which AI-related tokens you allow and which you disallow in robots.txt. This page offers a structured way to make and document those choices, while staying realistic: robots.txt is a request to compliant crawlers, not a legal or technical guarantee.

Verified against primary sources

Deciding per token

There is no single right answer; the trade-off is visibility versus control. Allowing an AI crawler can help your content be represented in that company's products; disallowing it asks them not to use your site for the purposes that token governs. Decide token by token rather than with one blanket switch, because tokens mean different things — a training crawler, a real-time user fetch, and a search crawler are distinct decisions.

Group your choices: search crawlers you almost always keep open; training crawlers and AI-use tokens such as Google-Extended, GPTBot, ClaudeBot, CCBot are where most policy decisions live.

Separate search, training, and real-time-fetch tokens
Decide per token, not with one blanket rule
Revisit as new tokens are published

Document the rationale, do not overclaim

Write down why you allow or disallow each token, so future maintainers understand the intent and you can revisit it as the landscape changes. Keep the language honest: robots.txt expresses a preference that compliant crawlers honour. It is not a contract, not a copyright enforcement mechanism, and not a technical block against non-compliant clients.

For content you must protect, combine policy with authentication and, where appropriate, terms of service — but do not claim robots.txt alone provides legal protection.

How it appears in analytics and logs

Your policy is only as effective as the crawlers' compliance. Observing which AI tokens still appear after you disallow them tells you which honour your rules and which to escalate.

Diagnostic use case

Build a defensible, documented robots.txt policy for AI crawlers — deciding per token whether visibility or opting out matters more for your site.

What WebmasterID can help detect

WebmasterID classifies AI crawlers server-side and shows their activity per page, so you can verify that your allow/deny choices match what compliant crawlers actually do.

Common mistakes

Blanket-blocking every AI token without separating training, fetch, and search.
Claiming robots.txt legally prevents AI use — it expresses a request.
Forgetting to document why each token was allowed or disallowed.

Privacy and accuracy notes

An AI crawler policy is a content-usage stance in a public file. It involves no visitor data and should not overclaim legal enforcement.

↑ All robots topics in Robots & crawl control

Sources and verification notes

Google — Overview of Google crawlers and AI controls
RFC 9309 — Robots Exclusion Protocolrobots.txt is a request to compliant crawlers, not enforcement.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.