XML sitemap best practices
An XML sitemap lists URLs you want crawled, helping search engines discover pages they might miss through links alone. The format has firm limits — 50,000 URLs and 50MB uncompressed per file — and works best when it contains only canonical, indexable, 200-status URLs with accurate lastmod values. This page covers the documented rules and the common quality problems that make a sitemap less useful.
What this means
An XML sitemap is a file that lists URLs along with optional metadata such as lastmod. It supplements link-based discovery, which matters most for large sites, new sites with few backlinks, and pages that are weakly linked internally.
A sitemap does not force indexing. It tells search engines which URLs you consider worth crawling. The cleaner and more accurate it is, the more useful that signal becomes.
Format limits and URL hygiene
Each sitemap file is limited to 50,000 URLs and 50MB uncompressed. Beyond that, split into multiple sitemaps and reference them from a sitemap index file. Sitemaps may be gzip-compressed.
Include only canonical, indexable URLs that return HTTP 200. Exclude redirects, 404s, noindex pages, robots-blocked URLs, and non-canonical duplicates — Search Console flags many of these as sitemap issues. Use fully qualified absolute URLs on a single host, and encode special characters.
- Max 50,000 URLs and 50MB uncompressed per sitemap file
- List only canonical, indexable, HTTP 200 URLs
- Use absolute URLs; one host per sitemap
- Split large sets and reference via a sitemap index
lastmod and priority/changefreq
The lastmod field should reflect the date the page content last meaningfully changed, in W3C datetime format. Google uses lastmod when it is consistently accurate; if every URL shows today's date, Google learns to ignore it.
Google ignores the priority and changefreq tags. Do not rely on them to influence crawling. Keep the sitemap fresh by regenerating it when content changes, and submit or ping it through Search Console.
How it appears in analytics and logs
A clean sitemap is a discovery aid, not an indexing guarantee. Listing a URL asks search engines to crawl it; including non-canonical, redirected, blocked, or error URLs wastes the signal and can trigger sitemap warnings in Search Console.
Diagnostic use case
Audit a sitemap for size limits, URL hygiene, and lastmod accuracy so search engines can discover and prioritize your important pages efficiently.
What WebmasterID can help detect
WebmasterID shows which URLs crawlers actually fetched and the status codes returned, so you can compare real crawl coverage against what your sitemap advertises and spot listed URLs that crawlers never reach or that return errors.
Common mistakes
- Listing redirected, 404, noindex, or non-canonical URLs in the sitemap.
- Setting lastmod to the generation date for every URL, which makes Google distrust it.
- Exceeding 50,000 URLs or 50MB in a single file instead of splitting.
- Relying on priority and changefreq, which Google ignores.
Privacy and accuracy notes
Sitemaps describe public URLs and metadata only. They contain no visitor data. WebmasterID treats sitemap-driven crawl activity as bot events and never associates it with human profiles.
Frequently asked questions
- How many URLs can one sitemap hold?
- Up to 50,000 URLs and 50MB uncompressed per file. For larger sites, split into multiple sitemaps and list them in a sitemap index file.
- Does a sitemap guarantee indexing?
- No. A sitemap helps search engines discover URLs and is a strong discovery aid, but indexing decisions depend on many other signals. It is a request to crawl, not a command to index.
Related pages
- Sitemap index files
A sitemap index file is a sitemap that lists other sitemaps, letting large sites stay within the 50,000-URL and 50MB-per-file limits while exposing all URLs through one submitted entry point. This page explains the sitemapindex format, the same per-file limits that apply to the index itself, and best practices for organizing and submitting multiple sitemaps.
- Sitemap lastmod accuracy
The lastmod element in a sitemap reports when a URL's content last changed. Google uses lastmod to prioritize recrawling only when the value is consistently accurate; if every URL shows the generation date or the homepage date, Google learns to distrust and ignore it. This page explains correct lastmod semantics, format, and the consequences of inaccuracy.
- Diagnosing XML sitemap errors
An XML sitemap helps search engines discover and prioritise your URLs, but a sitemap full of the wrong URLs sends mixed signals. Common errors include listing redirecting or non-200 URLs, including noindex or canonicalised-away pages, exceeding the 50,000-URL or 50 MB limits, or referencing the wrong protocol/host. A clean sitemap lists only canonical, indexable, 200-returning URLs.
- Website observability
Compare advertised sitemap URLs against the URLs crawlers actually fetch.
Sources and verification notes
- Google Search Central — Build and submit a sitemapSize limits, lastmod use, and ignored priority/changefreq.
- sitemaps.org — Protocol
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.