Serving robots.txt behind a CDN
A CDN sits between crawlers and your origin, so it shapes how robots.txt is delivered. This page explains edge caching of robots.txt, ensuring each hostname serves the right file, and avoiding stale rules from aggressive caching.
Edge caching of robots.txt
A CDN may cache robots.txt at the edge like any other asset. That compounds with crawler-side caching: even after you change the file at origin, the CDN can keep serving an old copy until its own cache expires or is purged.
Set a sensible cache TTL on robots.txt and purge the CDN cache after important edits. Use Cache-Control headers you control, and verify the live file from multiple edge locations after a change rather than only checking origin.
- CDN edge cache adds delay on top of crawler caching
- Purge the CDN after editing robots.txt
- Verify the served file, not just origin
Right file per hostname
Because robots.txt is per host, a CDN serving many hostnames must return the correct file for each. A misconfigured rule that maps several hosts to one robots.txt — or caches one host's file under another's key — can apply the wrong rules.
Confirm that each hostname's /robots.txt resolves to that host's intended rules at the edge. When a platform generates robots.txt dynamically, make sure the CDN cache key includes the host so versions are not crossed.
How it appears in analytics and logs
If a robots.txt edit at origin is not reflected to crawlers, the CDN edge cache is a prime suspect — it may still be serving the previous version from cache.
Diagnostic use case
Make sure robots.txt is correct and fresh for every hostname when a CDN caches and serves it, especially after editing the file at origin.
What WebmasterID can help detect
WebmasterID records what crawlers actually fetch at the edge, so you can detect when a CDN serves a stale or wrong robots.txt versus your origin file.
Common mistakes
- Editing robots.txt at origin but not purging the CDN cache.
- Caching one hostname's robots.txt under another host's key.
- Checking only origin and missing a stale edge copy.
Privacy and accuracy notes
CDN delivery of robots.txt concerns caching of a public file, not visitor data. robots.txt remains a crawl request, not an access-control or security layer.
Related pages
- How robots.txt works across subdomains
robots.txt applies per host, so each subdomain needs its own file. This page explains how the robots.txt scope is defined by scheme, host, and port, why a root-domain file does not govern subdomains, and how to manage policy across many hostnames.
- How crawlers cache robots.txt
Crawlers do not re-fetch robots.txt on every request — they cache it. This page explains Google's caching window, why your edits take time to take effect, and how caching interacts with HTTP cache headers and fetch failures.
- Monitoring robots.txt for changes and errors
robots.txt is a single file that can accidentally block an entire site. This page explains why monitoring it matters, which failure modes to watch (Disallow: /, 404, 5xx, unexpected diffs), and how crawl-behavior signals confirm a problem.
- Website observability
Detect a stale or wrong robots.txt served at the CDN edge.
Sources and verification notes
- Google — How Google interprets robots.txt (caching and per-host scope)Crawler caching and per-host scope; CDN behavior depends on your provider config.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.