How crawlers handle a redirected robots.txt
When /robots.txt returns a 3xx redirect, crawlers must decide whether to follow it. This page explains how Google follows robots.txt redirects, the hop limit, and why redirecting the file (especially cross-host) can lead to unexpected crawl behavior.
How Google follows robots.txt redirects
Google documents that it follows at least five redirect hops for robots.txt. If the chain resolves to a 200 response with a robots.txt, Google uses that file. If redirects do not resolve to a file within the hop limit, Google treats it as a 404 for robots.txt (which it handles as allow-all).
So a robots.txt that redirects to a valid file generally works, but a long or looping chain can cause Google to give up and assume open crawling.
- Google follows at least five redirect hops
- A resolved 200 robots.txt is used
- Unresolved redirects are treated like a 404 (allow-all)
Cross-host and migration risks
robots.txt is per host, so redirecting https://old.example.com/robots.txt to https://new.example.com/robots.txt means the crawler may apply rules intended for a different host. During migrations this can silently change which rules govern which hostname.
The safe pattern is to serve a real robots.txt directly at each host's root rather than relying on redirects. If a redirect is unavoidable, keep the chain short and confirm the resolved file contains the rules you intend for that host.
How it appears in analytics and logs
If rules you expect are not applied, a redirected /robots.txt is a candidate — the crawler may have followed the redirect to a different file, or stopped following after too many hops.
Diagnostic use case
Avoid breaking crawl control when migrating sites or consolidating hosts, where /robots.txt may end up redirecting instead of serving rules directly.
What WebmasterID can help detect
WebmasterID records crawler requests to /robots.txt and the pages they then fetch, helping you spot when a redirect causes the wrong (or no) rules to apply.
Common mistakes
- Redirecting robots.txt across hosts and applying the wrong host's rules.
- Creating a long or looping redirect chain that resolves to allow-all.
- Assuming a redirected robots.txt always serves the rules you expect.
Privacy and accuracy notes
Redirect handling concerns the robots.txt request itself, not visitors. No personal data is involved in how a crawler resolves the file's location.
Related pages
- How robots.txt works across subdomains
robots.txt applies per host, so each subdomain needs its own file. This page explains how the robots.txt scope is defined by scheme, host, and port, why a root-domain file does not govern subdomains, and how to manage policy across many hostnames.
- What crawlers do when robots.txt returns 404 or 5xx
The HTTP status of /robots.txt changes crawl behavior. This page explains why a 404 means crawl everything, why a persistent 5xx can pause crawling, and how Google's handling shifts when a server error lasts a long time.
- How crawlers cache robots.txt
Crawlers do not re-fetch robots.txt on every request — they cache it. This page explains Google's caching window, why your edits take time to take effect, and how caching interacts with HTTP cache headers and fetch failures.
- Website observability
See robots.txt fetches and resulting crawl behavior per host.
Sources and verification notes
- Google — How Google interprets robots.txt (redirect handling)Documents following at least five hops and 404-equivalent fallback.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.