AI crawler consent and opt-out signals
Several signals beyond a plain robots.txt block exist to express AI-use preferences: per-token robots.txt rules, the W3C TDM Reservation Protocol, and proposed meta directives such as noai/noimageai. They differ in scope and in how widely they are honoured. This entry maps the consent-signal landscape factually, without overstating which crawlers obey which.
The signals available
The most widely supported signal is robots.txt with per-token rules — disallowing GPTBot, Google-Extended, and similar tokens to express AI-specific preferences. Beyond that, the W3C TDM Reservation Protocol lets a site declare a reservation of rights for text-and-data-mining, intended to be machine-readable. Some publishers also use proposed meta tags such as noai and noimageai to signal that content should not be used for AI.
These operate at different levels: robots.txt and TDMRep are site/resource-level declarations, while meta directives are page-level. They can be combined.
Scope and what they do not guarantee
Support varies widely. Major vendors document honouring specific robots.txt tokens; the TDM Reservation Protocol and noai-style meta tags are less universally implemented, so publishing them expresses a preference that not every crawler will act on. None of these signals is an access-control mechanism — like robots.txt generally, they are requests to compliant systems.
Do not overstate adoption. The honest position is that these signals declare intent and are honoured by some crawlers; verifying compliance requires watching whether crawlers actually stay away after you publish them. Pair any consent signal with measurement, and never present it as legally binding enforcement.
- robots.txt per-token rules: widest support for AI opt-out
- W3C TDM Reservation Protocol: machine-readable rights reservation
- noai / noimageai meta directives: page-level, variable support
How it appears in analytics and logs
Which consent signals you publish shapes what compliant crawlers should do, but only observation confirms whether a given crawler honours them. A signal is a stated preference, not an enforced boundary.
Diagnostic use case
Choose the right opt-out signal for AI use — robots.txt token, TDM Reservation Protocol, or meta directives — and understand each one's reach and limits.
What WebmasterID can help detect
WebmasterID shows which AI crawlers still reach pages after you publish opt-out signals, so you can see whether a stated preference is being respected rather than assuming it.
Common mistakes
- Treating a noai meta tag or TDMRep declaration as universally honoured.
- Assuming any consent signal is enforceable rather than a request.
- Publishing a signal without measuring whether crawlers respect it.
Privacy and accuracy notes
Consent signals concern content use, not visitor identity. They are site-level declarations; WebmasterID records the crawls that follow as bot events only.
Frequently asked questions
- Is robots.txt enough to opt out of AI training?
- Per-token robots.txt rules are the most widely honoured opt-out for declared AI crawlers. Additional signals like TDMRep or noai meta tags express the same intent more broadly but have less uniform support. None is an enforced boundary.
Related pages
- How to opt out of AI training
Opting your content out of AI training is done through robots.txt: per-crawler tokens such as GPTBot and CCBot, plus dedicated control tokens like Google-Extended and Applebot-Extended. There is no single switch — you assemble the policy token by token, and it is a request to compliant systems.
- llms.txt and AI crawlers
llms.txt is a proposed convention: a Markdown file at your site root that points AI systems to your most important, LLM-friendly content. It is not robots.txt and not an access control — it is a curation hint. Adoption by AI crawlers is voluntary and uneven, so treat it as a complement to, not a replacement for, robots.txt and server-side controls.
- Do AI crawlers obey robots.txt?
Major declared AI crawlers such as GPTBot, ClaudeBot, and Google-Extended document that they honour robots.txt, but compliance is voluntary and varies across operators. robots.txt is a crawl request defined by a shared standard, not an access-control mechanism, so a non-compliant or undeclared scraper can ignore it. Enforcement requires server-side controls.
- Bot intelligence
Check whether AI crawlers still reach pages after you publish opt-out signals.
Sources and verification notes
- W3C — TDM Reservation ProtocolDefines a machine-readable rights-reservation signal for text and data mining.
- Google — Google-Extended controlDocuments a per-token robots.txt control for AI use.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.