WebmasterID logoWebmasterID
Robots & crawl control

Wildcards and path matching in robots.txt

Although the original protocol used simple prefix matching, major crawlers support two wildcards in path rules: * matches any sequence of characters, and $ anchors the end of the URL. This page covers how they behave, useful patterns, and the mistakes that make a rule too broad.

Verified against primary sources

How the wildcards work

Google documents that Googlebot and other major crawlers support two special characters in robots.txt path values:

* matches zero or more of any character, anywhere in the path. $ matches the end of the URL.

For example, Disallow: /*.pdf$ blocks URLs ending in .pdf, while Disallow: /search? blocks paths beginning with /search?. Without $, a pattern matches as a prefix, so Disallow: /private blocks /private, /private/, and /privatedata alike.

Patterns and pitfalls

Useful patterns include blocking query parameters (Disallow: /*?sort=), blocking a file type (Disallow: /*.json$), and carving out exceptions with Allow plus a more specific pattern. Remember that for a given URL the most specific (longest) matching rule wins between Allow and Disallow.

The common pitfall is over-matching: a bare Disallow: /news catches /newsletter too, because it is a prefix match. Anchor with $ or add a trailing slash (Disallow: /news/) when you mean a specific segment. Support for these wildcards is broad among major crawlers but not universal, so do not assume every minor crawler implements them.

How it appears in analytics and logs

A wildcard rule that blocks unexpected URLs usually matched a broader pattern than intended. Confirming which URLs a crawler still fetches reveals whether your pattern is correct.

Diagnostic use case

Write precise Allow/Disallow patterns — for query strings, file extensions, or path segments — without accidentally blocking more than intended.

What WebmasterID can help detect

WebmasterID shows which paths crawlers fetch, so after a wildcard change you can confirm the intended URLs are affected and no others.

Common mistakes

Privacy and accuracy notes

Path patterns are public configuration. Do not use them to 'hide' sensitive paths — listing them only advertises their existence.

Related pages

Sources and verification notes

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.