Robots & crawl control

robots.txt for international and multilingual sites

International sites split content by country or language using ccTLDs, subdomains, or subfolders. This page explains how robots.txt scope applies to each model and why blocking localized URLs can break hreflang and regional indexing.

Verified against primary sources

Scope depends on your architecture

robots.txt is per origin (scheme + host + port). The three common international models behave differently:

ccTLDs (example.de, example.fr) — each is a separate host and needs its own /robots.txt. Subdomains (de.example.com) — each subdomain needs its own file. Subfolders (example.com/de/) — all share one robots.txt at the root, and you control locales with path rules.

Know which model you use before writing rules, because a single root file governs only the subfolder model.

ccTLDs: one robots.txt per country domain
Subdomains: one robots.txt per subdomain
Subfolders: a single root robots.txt with path rules

Do not block hreflang URLs

hreflang links tell search engines about equivalent pages in other languages or regions. If you Disallow a localized URL that is referenced in hreflang, crawlers cannot confirm the relationship, weakening international targeting.

Keep all localized URLs that participate in hreflang crawlable. If a locale should not be indexed, prefer noindex on a crawlable URL over a robots.txt Disallow, so the hreflang and canonical signals can still be read.

How it appears in analytics and logs

If one locale is crawled but another is not, the cause is often robots.txt scope: a ccTLD or subdomain has its own file, or a subfolder rule blocks a localized path.

Diagnostic use case

Set correct crawl policy across an international architecture without accidentally blocking a language version or breaking hreflang relationships.

What WebmasterID can help detect

WebmasterID records crawler hits per hostname and path, so you can confirm each locale is being crawled as intended across ccTLDs, subdomains, or subfolders.

Common mistakes

Assuming the apex robots.txt governs every ccTLD or subdomain locale.
Disallowing a localized URL that hreflang points to.
Using robots.txt to geo-route users instead of server-side logic.

Privacy and accuracy notes

International robots.txt rules concern URL structure, not visitor location. Geo personalization should not rely on robots.txt, which is a crawl request, not a routing or access control.

↑ All robots topics in Robots & crawl control

Sources and verification notes

Google — How Google interprets robots.txt (per-host scope)robots.txt scope is per scheme, host, and port.
Google — Tell Google about localized versions (hreflang)Localized URLs in hreflang must be crawlable.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.