WebmasterID logoWebmasterID
Robots & crawl control

robots.txt and URL query parameters

Query-string URLs (?sort=, ?utm_source=, ?sessionid=) can multiply crawlable URLs. This page explains how robots.txt wildcards match parameters, when blocking helps, and why canonical or noindex is often better than a Disallow for duplicates.

Verified against primary sources

Matching parameters with wildcards

Google supports * (any sequence) and $ (end of URL) in robots.txt paths. To block crawling of any URL containing a specific parameter, match the parameter pattern:

User-agent: * Disallow: /*?*sort=

That blocks any path with a sort= parameter. To block all query strings on a path, use Disallow: /search?* — but be careful, because over-broad parameter blocks can also hide useful pages. Test patterns before deploying them.

Block vs canonical vs noindex

Blocking parameter URLs in robots.txt stops crawling but, like any Disallow, prevents the crawler from seeing a canonical or noindex on those URLs. For duplicate content (sort/filter variants of the same content), a rel=canonical to the clean URL usually consolidates signals better than a block.

Use a Disallow when the parameter URLs are genuinely worthless to crawl (session IDs, infinite calendars). Use canonical/noindex when the variants should still pass signals or be discoverable. Google also no longer offers a URL Parameters tool, so robots.txt and on-page signals are the levers now.

How it appears in analytics and logs

Lots of crawler hits on ?-parameter variants of the same page mean crawlers are exploring parameter space — often a crawl-budget drain rather than valuable indexing.

Diagnostic use case

Stop crawlers wasting crawl budget on infinite parameter combinations (faceted navigation, session IDs) while keeping canonical pages indexable.

What WebmasterID can help detect

WebmasterID shows which parameterized URLs crawlers hit, so you can tell whether parameter-handling rules are actually curbing wasteful crawling.

Common mistakes

Privacy and accuracy notes

Parameter rules concern URL patterns, not visitors. Avoid relying on robots.txt to hide parameters that carry sensitive values — keep secrets out of URLs entirely.

Related pages

Sources and verification notes

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.