WebmasterID logoWebmasterID
Robots & crawl control

Using robots.txt to protect crawl budget

On large sites, crawlers spend a finite amount of effort — often called crawl budget — and can waste it on low-value or near-duplicate URLs. robots.txt can steer them away from those paths so they reach your important pages more often. This matters mostly for big sites; small sites rarely need it.

Verified against primary sources

When crawl budget matters

Google describes crawl budget as a concern primarily for large sites — many thousands of URLs — or sites that generate many URL variations. On a small site, Google generally crawls efficiently and you do not need to manage budget. The problem appears when low-value URLs (endless filter combinations, session parameters, near-duplicates) consume crawl effort that would be better spent on your real content.

What to disallow

Use robots.txt to disallow patterns that produce low-value crawling, for example parameter URLs that do not change content meaningfully:

User-agent: * Disallow: /*?sort= Disallow: /*?sessionid=

Be careful not to block resources crawlers need to render pages (CSS, JS) or pages you actually want indexed. Note that Disallowing a URL prevents crawling but does not deindex an already-indexed URL — for that, allow crawling and use noindex. robots.txt steers crawl effort; it is not a deindexing tool.

How it appears in analytics and logs

Heavy crawl activity on parameter or low-value URLs can crowd out crawling of pages you care about. Disallowing those paths redirects crawl effort toward higher-value content.

Diagnostic use case

On a large site, stop crawlers from spending effort on faceted-navigation, parameter, or other low-value URLs so important pages are crawled more reliably.

What WebmasterID can help detect

WebmasterID shows which paths crawlers spend requests on, so you can see whether crawl effort is going to low-value URLs and confirm a robots.txt change redirected it.

Common mistakes

Privacy and accuracy notes

Crawl-budget rules are public configuration. They involve no visitor data; do not list sensitive paths expecting them to be hidden.

Related pages

Sources and verification notes

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.