Robots & crawl control

robots.txt and URL query parameters

Query-string URLs (?sort=, ?utm_source=, ?sessionid=) can multiply crawlable URLs. This page explains how robots.txt wildcards match parameters, when blocking helps, and why canonical or noindex is often better than a Disallow for duplicates.

Verified against primary sources

Matching parameters with wildcards

Google supports * (any sequence) and $ (end of URL) in robots.txt paths. To block crawling of any URL containing a specific parameter, match the parameter pattern:

User-agent: * Disallow: /*?*sort=

That blocks any path with a sort= parameter. To block all query strings on a path, use Disallow: /search?* — but be careful, because over-broad parameter blocks can also hide useful pages. Test patterns before deploying them.

* matches any sequence of characters
$ anchors the end of the URL
Disallow: /*?*param= blocks URLs containing that parameter

Block vs canonical vs noindex

Blocking parameter URLs in robots.txt stops crawling but, like any Disallow, prevents the crawler from seeing a canonical or noindex on those URLs. For duplicate content (sort/filter variants of the same content), a rel=canonical to the clean URL usually consolidates signals better than a block.

Use a Disallow when the parameter URLs are genuinely worthless to crawl (session IDs, infinite calendars). Use canonical/noindex when the variants should still pass signals or be discoverable. Google also no longer offers a URL Parameters tool, so robots.txt and on-page signals are the levers now.

How it appears in analytics and logs

Lots of crawler hits on ?-parameter variants of the same page mean crawlers are exploring parameter space — often a crawl-budget drain rather than valuable indexing.

Diagnostic use case

Stop crawlers wasting crawl budget on infinite parameter combinations (faceted navigation, session IDs) while keeping canonical pages indexable.

What WebmasterID can help detect

WebmasterID shows which parameterized URLs crawlers hit, so you can tell whether parameter-handling rules are actually curbing wasteful crawling.

Common mistakes

Blocking parameter URLs that carry a canonical you wanted crawlers to read.
Writing an over-broad pattern that also blocks important pages.
Putting sensitive values in query strings and trusting robots.txt to hide them.

Privacy and accuracy notes

Parameter rules concern URL patterns, not visitors. Avoid relying on robots.txt to hide parameters that carry sensitive values — keep secrets out of URLs entirely.

↑ All robots topics in Robots & crawl control

Sources and verification notes

Google — How Google interprets robots.txt (path matching, wildcards)Documents * and $ wildcard support in robots.txt paths.
Google — Consolidate duplicate URLs (canonicalization)Canonical signals for parameter duplicates.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.