Robots & crawl control

ai.txt, TDM reservation, and llms.txt

Beyond robots.txt, several conventions aim to express AI and machine-use preferences: ai.txt proposals, text-and-data-mining (TDM) reservation signals tied to EU copyright law, and llms.txt. Adoption and legal weight vary and are still settling, so this page describes the intent without overclaiming enforcement.

Partially verified

The conventions in brief

Several signals have been proposed to express AI-related preferences. ai.txt has been floated as a robots-style file specifically for AI usage. Text-and-data-mining (TDM) reservation is a rights-reservation concept connected to EU copyright law, under which rightsholders can reserve their works against certain data-mining uses in a machine-readable way. llms.txt is a community proposal for a content map aimed at language models.

These differ in origin and intent: some are informal proposals, while TDM reservation is tied to a legal framework. None is a single, universally adopted standard.

ai.txt — proposed robots-style file for AI usage
TDM reservation — rights-reservation signal tied to EU copyright law
llms.txt — community content-map proposal for LLMs

Uncertain status — do not overclaim

Adoption and enforceability vary widely and are still evolving. robots.txt remains the most broadly honoured technical signal for crawl control. The newer conventions may carry preference or, in the TDM case, legal-reservation weight in some jurisdictions, but whether a given crawler or party respects them is not guaranteed.

Treat these as supplementary expressions of intent. Where you need a real technical limit, combine robots.txt with authentication; where rights matter, consult qualified legal advice rather than relying on a file alone.

Adoption and legal weight vary by convention and jurisdiction
robots.txt is still the most broadly honoured crawl signal
For rights questions, seek legal advice — a file is not a guarantee

How it appears in analytics and logs

These conventions signal preferences for AI and data-mining use. Their presence does not change crawl permissions and does not guarantee any tool or party honours them.

Diagnostic use case

Understand the emerging AI-control conventions that sit alongside robots.txt, and decide whether to publish them while treating their effect as uncertain.

What WebmasterID can help detect

WebmasterID shows which AI crawlers and assistants reach your pages, so you can observe real activity regardless of whether any party honours these emerging conventions.

Common mistakes

Treating ai.txt or llms.txt as enforced standards — adoption is voluntary.
Assuming a TDM reservation signal is universally honoured by all crawlers.
Relying on any of these for a technical limit instead of authentication.

Privacy and accuracy notes

Like robots.txt, these files are public. They express usage preferences, not access control, and involve no visitor data.

↑ All robots topics in Robots & crawl control

Sources and verification notes

llmstxt.org — the llms.txt proposalCommunity proposal; adoption is voluntary.
EU — Directive 2019/790 (DSM), TDM exceptions and reservationLegal basis for text-and-data-mining rights reservation; ai.txt remains an informal proposal.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.