AI crawlers

How to opt out of AI training

Opting your content out of AI training is done through robots.txt: per-crawler tokens such as GPTBot and CCBot, plus dedicated control tokens like Google-Extended and Applebot-Extended. There is no single switch — you assemble the policy token by token, and it is a request to compliant systems.

Verified against primary sources

There is no single switch

Opting out of AI training is assembled from multiple robots.txt rules, because different vendors expose different controls. Some training crawlers have their own tokens you disallow directly, such as GPTBot and CCBot. Others provide a dedicated training-control token layered on an existing crawler — Google-Extended for Google's generative AI, and Applebot-Extended for Apple's.

The key consequence: there is no universal opt-out token. You build the policy vendor by vendor. Importantly, control tokens like Google-Extended do not affect search indexing — Googlebot keeps crawling for Search regardless.

A practical checklist

Decide your intent first — opt out of training while keeping search visibility is a common goal — then express it per token. Disallow direct training-crawler tokens you want to exclude, and set the dedicated training-control tokens where vendors provide them.

Remember that robots.txt is a request honoured by compliant systems, not an access-control boundary, and it does not retroactively remove content already incorporated. Pair the rules with measurement so you can see whether the crawlers you opted out actually stop appearing.

Direct training crawlers: e.g. GPTBot, CCBot
Dedicated control tokens: Google-Extended, Applebot-Extended
Search indexing is controlled separately and is unaffected

How it appears in analytics and logs

An AI-training opt-out lives in your robots.txt as a set of token rules. It is a policy signal, not a crawl event; whether it is honoured depends on each vendor's compliance, which you can confirm by observing whether those crawlers still appear.

Diagnostic use case

Assemble a robots.txt policy that opts your content out of AI training across the major vendors that offer such controls.

What WebmasterID can help detect

WebmasterID shows which AI training crawlers still reach your site after you set opt-out tokens, so you can confirm whether your policy is being respected rather than assuming it.

Common mistakes

Looking for a single universal AI-training opt-out token — there is none.
Assuming a training opt-out also removes you from search indexes.
Expecting opt-out tokens to retroactively undo past training use.

Privacy and accuracy notes

These are robots.txt directives, not requests, so they involve no visitor data. They govern how vendors may use already-public content for training — a policy matter, not an identity one.

↑ All AI crawlers in AI crawlers

Sources and verification notes

Google — Google-Extended documentationDocuments Google-Extended as a training control separate from Googlebot.
Apple — Applebot-Extended documentationDocuments the Applebot-Extended training opt-out token.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.