WebmasterID logoWebmasterID
Crawl diagnostics

XML sitemap best practices

An XML sitemap lists URLs you want crawled, helping search engines discover pages they might miss through links alone. The format has firm limits — 50,000 URLs and 50MB uncompressed per file — and works best when it contains only canonical, indexable, 200-status URLs with accurate lastmod values. This page covers the documented rules and the common quality problems that make a sitemap less useful.

Verified against primary sources

What this means

An XML sitemap is a file that lists URLs along with optional metadata such as lastmod. It supplements link-based discovery, which matters most for large sites, new sites with few backlinks, and pages that are weakly linked internally.

A sitemap does not force indexing. It tells search engines which URLs you consider worth crawling. The cleaner and more accurate it is, the more useful that signal becomes.

Format limits and URL hygiene

Each sitemap file is limited to 50,000 URLs and 50MB uncompressed. Beyond that, split into multiple sitemaps and reference them from a sitemap index file. Sitemaps may be gzip-compressed.

Include only canonical, indexable URLs that return HTTP 200. Exclude redirects, 404s, noindex pages, robots-blocked URLs, and non-canonical duplicates — Search Console flags many of these as sitemap issues. Use fully qualified absolute URLs on a single host, and encode special characters.

lastmod and priority/changefreq

The lastmod field should reflect the date the page content last meaningfully changed, in W3C datetime format. Google uses lastmod when it is consistently accurate; if every URL shows today's date, Google learns to ignore it.

Google ignores the priority and changefreq tags. Do not rely on them to influence crawling. Keep the sitemap fresh by regenerating it when content changes, and submit or ping it through Search Console.

How it appears in analytics and logs

A clean sitemap is a discovery aid, not an indexing guarantee. Listing a URL asks search engines to crawl it; including non-canonical, redirected, blocked, or error URLs wastes the signal and can trigger sitemap warnings in Search Console.

Diagnostic use case

Audit a sitemap for size limits, URL hygiene, and lastmod accuracy so search engines can discover and prioritize your important pages efficiently.

What WebmasterID can help detect

WebmasterID shows which URLs crawlers actually fetched and the status codes returned, so you can compare real crawl coverage against what your sitemap advertises and spot listed URLs that crawlers never reach or that return errors.

Common mistakes

Privacy and accuracy notes

Sitemaps describe public URLs and metadata only. They contain no visitor data. WebmasterID treats sitemap-driven crawl activity as bot events and never associates it with human profiles.

Frequently asked questions

How many URLs can one sitemap hold?
Up to 50,000 URLs and 50MB uncompressed per file. For larger sites, split into multiple sitemaps and list them in a sitemap index file.
Does a sitemap guarantee indexing?
No. A sitemap helps search engines discover URLs and is a strong discovery aid, but indexing decisions depend on many other signals. It is a request to crawl, not a command to index.

Related pages

Sources and verification notes

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.