Robots & crawl control

Serving robots.txt behind a CDN

A CDN sits between crawlers and your origin, so it shapes how robots.txt is delivered. This page explains edge caching of robots.txt, ensuring each hostname serves the right file, and avoiding stale rules from aggressive caching.

Partially verified

Edge caching of robots.txt

A CDN may cache robots.txt at the edge like any other asset. That compounds with crawler-side caching: even after you change the file at origin, the CDN can keep serving an old copy until its own cache expires or is purged.

Set a sensible cache TTL on robots.txt and purge the CDN cache after important edits. Use Cache-Control headers you control, and verify the live file from multiple edge locations after a change rather than only checking origin.

CDN edge cache adds delay on top of crawler caching
Purge the CDN after editing robots.txt
Verify the served file, not just origin

Right file per hostname

Because robots.txt is per host, a CDN serving many hostnames must return the correct file for each. A misconfigured rule that maps several hosts to one robots.txt — or caches one host's file under another's key — can apply the wrong rules.

Confirm that each hostname's /robots.txt resolves to that host's intended rules at the edge. When a platform generates robots.txt dynamically, make sure the CDN cache key includes the host so versions are not crossed.

How it appears in analytics and logs

If a robots.txt edit at origin is not reflected to crawlers, the CDN edge cache is a prime suspect — it may still be serving the previous version from cache.

Diagnostic use case

Make sure robots.txt is correct and fresh for every hostname when a CDN caches and serves it, especially after editing the file at origin.

What WebmasterID can help detect

WebmasterID records what crawlers actually fetch at the edge, so you can detect when a CDN serves a stale or wrong robots.txt versus your origin file.

Common mistakes

Editing robots.txt at origin but not purging the CDN cache.
Caching one hostname's robots.txt under another host's key.
Checking only origin and missing a stale edge copy.

Privacy and accuracy notes

CDN delivery of robots.txt concerns caching of a public file, not visitor data. robots.txt remains a crawl request, not an access-control or security layer.

↑ All robots topics in Robots & crawl control

Sources and verification notes

Google — How Google interprets robots.txt (caching and per-host scope)Crawler caching and per-host scope; CDN behavior depends on your provider config.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.