WebmasterID logoWebmasterID
Robots & crawl control

robots.txt and sitemap coordination

robots.txt and your XML sitemap work together: the Sitemap directive advertises your sitemap to crawlers, and Search Console submission gives Google a direct feed. The key is consistency — do not list URLs in a sitemap that your robots.txt disallows, or you send crawlers contradictory instructions.

Verified against primary sources

Two ways to advertise a sitemap

First, the robots.txt Sitemap directive points crawlers at your sitemap with an absolute URL, and applies globally regardless of user-agent group:

Sitemap: https://example.com/sitemap.xml

Second, submit the sitemap in Google Search Console (and equivalent tools for other engines) for a direct feed and indexing reports. The two are complementary — the directive helps any crawler discover the sitemap, while submission gives you status feedback for that engine.

Keep them consistent

The common failure is contradiction: a sitemap should list canonical, indexable URLs, so do not include URLs that your robots.txt disallows or that are noindexed. Listing a URL in a sitemap does not override a Disallow — the crawler still will not fetch it — so the entry just sends a mixed signal.

Keep the sitemap in sync with your crawl rules: when you disallow a section, remove its URLs from the sitemap; when you add indexable pages, make sure they are crawlable and present in the sitemap.

How it appears in analytics and logs

A URL listed in a sitemap but disallowed in robots.txt is contradictory: you are asking a crawler to discover a URL you also told it not to fetch. Such mismatches show up as crawl/index anomalies.

Diagnostic use case

Advertise sitemaps via robots.txt and Search Console while keeping sitemap contents consistent with your allow/disallow rules.

What WebmasterID can help detect

WebmasterID shows which URLs crawlers actually fetch, so you can see whether sitemap discovery and your robots.txt rules are working together as intended.

Common mistakes

Privacy and accuracy notes

Both robots.txt and the sitemaps it lists are public. Do not advertise sitemaps that expose paths you intend to keep private.

Related pages

Sources and verification notes

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.