robots.txt for international and multilingual sites
International sites split content by country or language using ccTLDs, subdomains, or subfolders. This page explains how robots.txt scope applies to each model and why blocking localized URLs can break hreflang and regional indexing.
Scope depends on your architecture
robots.txt is per origin (scheme + host + port). The three common international models behave differently:
ccTLDs (example.de, example.fr) — each is a separate host and needs its own /robots.txt. Subdomains (de.example.com) — each subdomain needs its own file. Subfolders (example.com/de/) — all share one robots.txt at the root, and you control locales with path rules.
Know which model you use before writing rules, because a single root file governs only the subfolder model.
- ccTLDs: one robots.txt per country domain
- Subdomains: one robots.txt per subdomain
- Subfolders: a single root robots.txt with path rules
Do not block hreflang URLs
hreflang links tell search engines about equivalent pages in other languages or regions. If you Disallow a localized URL that is referenced in hreflang, crawlers cannot confirm the relationship, weakening international targeting.
Keep all localized URLs that participate in hreflang crawlable. If a locale should not be indexed, prefer noindex on a crawlable URL over a robots.txt Disallow, so the hreflang and canonical signals can still be read.
How it appears in analytics and logs
If one locale is crawled but another is not, the cause is often robots.txt scope: a ccTLD or subdomain has its own file, or a subfolder rule blocks a localized path.
Diagnostic use case
Set correct crawl policy across an international architecture without accidentally blocking a language version or breaking hreflang relationships.
What WebmasterID can help detect
WebmasterID records crawler hits per hostname and path, so you can confirm each locale is being crawled as intended across ccTLDs, subdomains, or subfolders.
Common mistakes
- Assuming the apex robots.txt governs every ccTLD or subdomain locale.
- Disallowing a localized URL that hreflang points to.
- Using robots.txt to geo-route users instead of server-side logic.
Privacy and accuracy notes
International robots.txt rules concern URL structure, not visitor location. Geo personalization should not rely on robots.txt, which is a crawl request, not a routing or access control.
Related pages
- How robots.txt works across subdomains
robots.txt applies per host, so each subdomain needs its own file. This page explains how the robots.txt scope is defined by scheme, host, and port, why a root-domain file does not govern subdomains, and how to manage policy across many hostnames.
- robots.txt path matching and case sensitivity
robots.txt path rules are compared against the URL path, and that comparison is case-sensitive: /Page and /page are different. This page covers how Google matches paths, why case and encoding matter, and how trailing characters and wildcards change the rule that applies.
- Canonical vs noindex: which to use
rel=canonical and noindex are often confused. Canonical tells search engines which of several similar URLs to treat as the primary, consolidating signals onto it. noindex removes a page from the index entirely. This page explains when each is right and why combining them on one URL sends conflicting signals.
- Web crawler encyclopedia
How crawlers treat international site structures.
Sources and verification notes
- Google — How Google interprets robots.txt (per-host scope)robots.txt scope is per scheme, host, and port.
- Google — Tell Google about localized versions (hreflang)Localized URLs in hreflang must be crawlable.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.