robots.txt and sitemap coordination
robots.txt and your XML sitemap work together: the Sitemap directive advertises your sitemap to crawlers, and Search Console submission gives Google a direct feed. The key is consistency — do not list URLs in a sitemap that your robots.txt disallows, or you send crawlers contradictory instructions.
Two ways to advertise a sitemap
First, the robots.txt Sitemap directive points crawlers at your sitemap with an absolute URL, and applies globally regardless of user-agent group:
Sitemap: https://example.com/sitemap.xml
Second, submit the sitemap in Google Search Console (and equivalent tools for other engines) for a direct feed and indexing reports. The two are complementary — the directive helps any crawler discover the sitemap, while submission gives you status feedback for that engine.
- robots.txt Sitemap directive uses an absolute URL, applies globally
- Search Console submission adds a direct feed and reports
- Use both for discovery plus status feedback
Keep them consistent
The common failure is contradiction: a sitemap should list canonical, indexable URLs, so do not include URLs that your robots.txt disallows or that are noindexed. Listing a URL in a sitemap does not override a Disallow — the crawler still will not fetch it — so the entry just sends a mixed signal.
Keep the sitemap in sync with your crawl rules: when you disallow a section, remove its URLs from the sitemap; when you add indexable pages, make sure they are crawlable and present in the sitemap.
- Do not list disallowed or noindexed URLs in a sitemap
- A sitemap entry does not override a Disallow
- Keep sitemap contents in sync with robots.txt rules
How it appears in analytics and logs
A URL listed in a sitemap but disallowed in robots.txt is contradictory: you are asking a crawler to discover a URL you also told it not to fetch. Such mismatches show up as crawl/index anomalies.
Diagnostic use case
Advertise sitemaps via robots.txt and Search Console while keeping sitemap contents consistent with your allow/disallow rules.
What WebmasterID can help detect
WebmasterID shows which URLs crawlers actually fetch, so you can see whether sitemap discovery and your robots.txt rules are working together as intended.
Common mistakes
- Listing URLs in a sitemap that robots.txt disallows.
- Using a relative path in the Sitemap directive instead of an absolute URL.
- Including noindexed or non-canonical URLs in the sitemap.
Privacy and accuracy notes
Both robots.txt and the sitemaps it lists are public. Do not advertise sitemaps that expose paths you intend to keep private.
Related pages
- The Sitemap directive in robots.txt
The Sitemap directive points crawlers at your XML sitemap. It uses an absolute URL, can appear multiple times to list several sitemaps, and works independently of your allow/disallow rules — it is a discovery hint, not a crawl-permission rule.
- robots.txt basics: what it does and what it cannot do
robots.txt is a plain-text file at your site root that tells compliant crawlers which paths they may request. This page covers the directives, how user-agent groups are matched, and the limits that trip people up: robots.txt is advisory, it does not hide pages from search, and it is not a security boundary.
- robots.txt common mistakes
Most robots.txt problems come from a handful of recurring mistakes. This page collects the big ones — blocking the CSS and JS crawlers need to render, trying to deindex with Disallow, advertising secret paths, and treating an advisory file as enforcement — with the correct approach for each.
- Website observability
See whether sitemap discovery and robots rules work together.
Sources and verification notes
- Google — How Google interprets robots.txtDocuments the Sitemap directive and absolute-URL requirement.
- Google — Build and submit a sitemap
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.