Robots & crawl control

robots.txt and sitemap coordination

robots.txt and your XML sitemap work together: the Sitemap directive advertises your sitemap to crawlers, and Search Console submission gives Google a direct feed. The key is consistency — do not list URLs in a sitemap that your robots.txt disallows, or you send crawlers contradictory instructions.

Verified against primary sources

Two ways to advertise a sitemap

First, the robots.txt Sitemap directive points crawlers at your sitemap with an absolute URL, and applies globally regardless of user-agent group:

Sitemap: https://example.com/sitemap.xml

Second, submit the sitemap in Google Search Console (and equivalent tools for other engines) for a direct feed and indexing reports. The two are complementary — the directive helps any crawler discover the sitemap, while submission gives you status feedback for that engine.

robots.txt Sitemap directive uses an absolute URL, applies globally
Search Console submission adds a direct feed and reports
Use both for discovery plus status feedback

Keep them consistent

The common failure is contradiction: a sitemap should list canonical, indexable URLs, so do not include URLs that your robots.txt disallows or that are noindexed. Listing a URL in a sitemap does not override a Disallow — the crawler still will not fetch it — so the entry just sends a mixed signal.

Keep the sitemap in sync with your crawl rules: when you disallow a section, remove its URLs from the sitemap; when you add indexable pages, make sure they are crawlable and present in the sitemap.

Do not list disallowed or noindexed URLs in a sitemap
A sitemap entry does not override a Disallow
Keep sitemap contents in sync with robots.txt rules

How it appears in analytics and logs

A URL listed in a sitemap but disallowed in robots.txt is contradictory: you are asking a crawler to discover a URL you also told it not to fetch. Such mismatches show up as crawl/index anomalies.

Diagnostic use case

Advertise sitemaps via robots.txt and Search Console while keeping sitemap contents consistent with your allow/disallow rules.

What WebmasterID can help detect

WebmasterID shows which URLs crawlers actually fetch, so you can see whether sitemap discovery and your robots.txt rules are working together as intended.

Common mistakes

Listing URLs in a sitemap that robots.txt disallows.
Using a relative path in the Sitemap directive instead of an absolute URL.
Including noindexed or non-canonical URLs in the sitemap.

Privacy and accuracy notes

Both robots.txt and the sitemaps it lists are public. Do not advertise sitemaps that expose paths you intend to keep private.

↑ All robots topics in Robots & crawl control

Sources and verification notes

Google — How Google interprets robots.txtDocuments the Sitemap directive and absolute-URL requirement.
Google — Build and submit a sitemap

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.