Robots & crawl control

How robots.txt works across subdomains

robots.txt applies per host, so each subdomain needs its own file. This page explains how the robots.txt scope is defined by scheme, host, and port, why a root-domain file does not govern subdomains, and how to manage policy across many hostnames.

Verified against primary sources

robots.txt is per host

A robots.txt file governs only the origin that serves it, defined by scheme, host, and port. https://example.com/robots.txt controls https://example.com, but it does not control https://blog.example.com — that subdomain needs its own /robots.txt. Likewise http and https on the same host are technically separate origins for robots.txt purposes.

This is why a Disallow placed only at the root domain leaves subdomains fully crawlable. Each hostname you want to govern must serve its own file at /robots.txt.

Scope = scheme + host + port
blog.example.com needs its own /robots.txt
A root-domain file does not cover subdomains

Managing many subdomains

When several subdomains share a codebase, generate each one's robots.txt so the rules are correct per host — for example a stricter policy on a staging subdomain than on production. If subdomains are served by a CDN or platform, confirm the file is reachable at each hostname's root, not just the apex.

Sitemap directives in robots.txt should list sitemaps for that host. Cross-host sitemap references are allowed in some setups but are easy to get wrong, so keep each host's robots.txt and sitemap aligned and verify with a tester per hostname.

How it appears in analytics and logs

If you see a subdomain being crawled despite a Disallow on the root domain, it usually means the subdomain serves its own robots.txt (or none) — the root file does not apply to it.

Diagnostic use case

Set correct crawl policy when a site spans several subdomains — for example blog., shop., and a staging subdomain — without assuming one file covers them all.

What WebmasterID can help detect

WebmasterID records crawler hits per hostname, so you can see whether each subdomain's robots.txt is actually shaping crawler behavior the way you intended.

Common mistakes

Assuming the apex robots.txt automatically governs every subdomain.
Forgetting that http and https are separate origins for robots.txt.
Leaving a staging subdomain without its own restrictive robots.txt.

Privacy and accuracy notes

robots.txt scope concerns your own hostnames. It involves no visitor data and is not an access-control mechanism.

↑ All robots topics in Robots & crawl control

Sources and verification notes

Google — How Google interprets robots.txt (file location and scope)Documents that robots.txt applies per scheme, host, and port.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.