WebmasterID logoWebmasterID
Robots & crawl control

robots.txt for international and multilingual sites

International sites split content by country or language using ccTLDs, subdomains, or subfolders. This page explains how robots.txt scope applies to each model and why blocking localized URLs can break hreflang and regional indexing.

Verified against primary sources

Scope depends on your architecture

robots.txt is per origin (scheme + host + port). The three common international models behave differently:

ccTLDs (example.de, example.fr) — each is a separate host and needs its own /robots.txt. Subdomains (de.example.com) — each subdomain needs its own file. Subfolders (example.com/de/) — all share one robots.txt at the root, and you control locales with path rules.

Know which model you use before writing rules, because a single root file governs only the subfolder model.

Do not block hreflang URLs

hreflang links tell search engines about equivalent pages in other languages or regions. If you Disallow a localized URL that is referenced in hreflang, crawlers cannot confirm the relationship, weakening international targeting.

Keep all localized URLs that participate in hreflang crawlable. If a locale should not be indexed, prefer noindex on a crawlable URL over a robots.txt Disallow, so the hreflang and canonical signals can still be read.

How it appears in analytics and logs

If one locale is crawled but another is not, the cause is often robots.txt scope: a ccTLD or subdomain has its own file, or a subfolder rule blocks a localized path.

Diagnostic use case

Set correct crawl policy across an international architecture without accidentally blocking a language version or breaking hreflang relationships.

What WebmasterID can help detect

WebmasterID records crawler hits per hostname and path, so you can confirm each locale is being crawled as intended across ccTLDs, subdomains, or subfolders.

Common mistakes

Privacy and accuracy notes

International robots.txt rules concern URL structure, not visitor location. Geo personalization should not rely on robots.txt, which is a crawl request, not a routing or access control.

Related pages

Sources and verification notes

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.