Is robots.txt enough to opt out of AI training?

Per-token robots.txt rules are the most widely honoured opt-out for declared AI crawlers. Additional signals like TDMRep or noai meta tags express the same intent more broadly but have less uniform support. None is an enforced boundary.

AI crawlers

AI crawler consent and opt-out signals

Several signals beyond a plain robots.txt block exist to express AI-use preferences: per-token robots.txt rules, the W3C TDM Reservation Protocol, and proposed meta directives such as noai/noimageai. They differ in scope and in how widely they are honoured. This entry maps the consent-signal landscape factually, without overstating which crawlers obey which.

Partially verified

The signals available

The most widely supported signal is robots.txt with per-token rules — disallowing GPTBot, Google-Extended, and similar tokens to express AI-specific preferences. Beyond that, the W3C TDM Reservation Protocol lets a site declare a reservation of rights for text-and-data-mining, intended to be machine-readable. Some publishers also use proposed meta tags such as noai and noimageai to signal that content should not be used for AI.

These operate at different levels: robots.txt and TDMRep are site/resource-level declarations, while meta directives are page-level. They can be combined.

Scope and what they do not guarantee

Support varies widely. Major vendors document honouring specific robots.txt tokens; the TDM Reservation Protocol and noai-style meta tags are less universally implemented, so publishing them expresses a preference that not every crawler will act on. None of these signals is an access-control mechanism — like robots.txt generally, they are requests to compliant systems.

Do not overstate adoption. The honest position is that these signals declare intent and are honoured by some crawlers; verifying compliance requires watching whether crawlers actually stay away after you publish them. Pair any consent signal with measurement, and never present it as legally binding enforcement.

robots.txt per-token rules: widest support for AI opt-out
W3C TDM Reservation Protocol: machine-readable rights reservation
noai / noimageai meta directives: page-level, variable support

How it appears in analytics and logs

Which consent signals you publish shapes what compliant crawlers should do, but only observation confirms whether a given crawler honours them. A signal is a stated preference, not an enforced boundary.

Diagnostic use case

Choose the right opt-out signal for AI use — robots.txt token, TDM Reservation Protocol, or meta directives — and understand each one's reach and limits.

What WebmasterID can help detect

WebmasterID shows which AI crawlers still reach pages after you publish opt-out signals, so you can see whether a stated preference is being respected rather than assuming it.

Common mistakes

Treating a noai meta tag or TDMRep declaration as universally honoured.
Assuming any consent signal is enforceable rather than a request.
Publishing a signal without measuring whether crawlers respect it.

Privacy and accuracy notes

Consent signals concern content use, not visitor identity. They are site-level declarations; WebmasterID records the crawls that follow as bot events only.

Frequently asked questions

Is robots.txt enough to opt out of AI training?: Per-token robots.txt rules are the most widely honoured opt-out for declared AI crawlers. Additional signals like TDMRep or noai meta tags express the same intent more broadly but have less uniform support. None is an enforced boundary.

↑ All AI crawlers in AI crawlers

Sources and verification notes

W3C — TDM Reservation ProtocolDefines a machine-readable rights-reservation signal for text and data mining.
Google — Google-Extended controlDocuments a per-token robots.txt control for AI use.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.