How robots.txt works across subdomains
robots.txt applies per host, so each subdomain needs its own file. This page explains how the robots.txt scope is defined by scheme, host, and port, why a root-domain file does not govern subdomains, and how to manage policy across many hostnames.
robots.txt is per host
A robots.txt file governs only the origin that serves it, defined by scheme, host, and port. https://example.com/robots.txt controls https://example.com, but it does not control https://blog.example.com — that subdomain needs its own /robots.txt. Likewise http and https on the same host are technically separate origins for robots.txt purposes.
This is why a Disallow placed only at the root domain leaves subdomains fully crawlable. Each hostname you want to govern must serve its own file at /robots.txt.
- Scope = scheme + host + port
- blog.example.com needs its own /robots.txt
- A root-domain file does not cover subdomains
Managing many subdomains
When several subdomains share a codebase, generate each one's robots.txt so the rules are correct per host — for example a stricter policy on a staging subdomain than on production. If subdomains are served by a CDN or platform, confirm the file is reachable at each hostname's root, not just the apex.
Sitemap directives in robots.txt should list sitemaps for that host. Cross-host sitemap references are allowed in some setups but are easy to get wrong, so keep each host's robots.txt and sitemap aligned and verify with a tester per hostname.
How it appears in analytics and logs
If you see a subdomain being crawled despite a Disallow on the root domain, it usually means the subdomain serves its own robots.txt (or none) — the root file does not apply to it.
Diagnostic use case
Set correct crawl policy when a site spans several subdomains — for example blog., shop., and a staging subdomain — without assuming one file covers them all.
What WebmasterID can help detect
WebmasterID records crawler hits per hostname, so you can see whether each subdomain's robots.txt is actually shaping crawler behavior the way you intended.
Common mistakes
- Assuming the apex robots.txt automatically governs every subdomain.
- Forgetting that http and https are separate origins for robots.txt.
- Leaving a staging subdomain without its own restrictive robots.txt.
Privacy and accuracy notes
robots.txt scope concerns your own hostnames. It involves no visitor data and is not an access-control mechanism.
Related pages
- robots.txt for staging sites
Teams often try to keep a staging or pre-production site private with a robots.txt Disallow. That is the wrong tool: robots.txt is public and advisory, and a blocked staging URL linked anywhere can still surface in search. The right answer is authentication, with noindex as a secondary signal.
- robots.txt and sitemap coordination
robots.txt and your XML sitemap work together: the Sitemap directive advertises your sitemap to crawlers, and Search Console submission gives Google a direct feed. The key is consistency — do not list URLs in a sitemap that your robots.txt disallows, or you send crawlers contradictory instructions.
- How to test your robots.txt
A robots.txt rule is only useful if it does what you think. This page covers how to test it — checking the live file, using Google Search Console's robots.txt report and URL Inspection, and confirming in your own logs that the intended crawlers are or are not fetching the affected URLs.
- Website observability
See crawler activity per hostname across your subdomains.
Sources and verification notes
- Google — How Google interprets robots.txt (file location and scope)Documents that robots.txt applies per scheme, host, and port.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.