Wildcards and path matching in robots.txt
Although the original protocol used simple prefix matching, major crawlers support two wildcards in path rules: * matches any sequence of characters, and $ anchors the end of the URL. This page covers how they behave, useful patterns, and the mistakes that make a rule too broad.
How the wildcards work
Google documents that Googlebot and other major crawlers support two special characters in robots.txt path values:
* matches zero or more of any character, anywhere in the path. $ matches the end of the URL.
For example, Disallow: /*.pdf$ blocks URLs ending in .pdf, while Disallow: /search? blocks paths beginning with /search?. Without $, a pattern matches as a prefix, so Disallow: /private blocks /private, /private/, and /privatedata alike.
- * — matches any sequence of characters
- $ — anchors the match to the end of the URL
- No trailing $ means prefix matching
Patterns and pitfalls
Useful patterns include blocking query parameters (Disallow: /*?sort=), blocking a file type (Disallow: /*.json$), and carving out exceptions with Allow plus a more specific pattern. Remember that for a given URL the most specific (longest) matching rule wins between Allow and Disallow.
The common pitfall is over-matching: a bare Disallow: /news catches /newsletter too, because it is a prefix match. Anchor with $ or add a trailing slash (Disallow: /news/) when you mean a specific segment. Support for these wildcards is broad among major crawlers but not universal, so do not assume every minor crawler implements them.
- Prefix matching over-catches — /news also blocks /newsletter
- Use $ or a trailing slash to scope a rule
- Longest matching rule wins between Allow and Disallow
How it appears in analytics and logs
A wildcard rule that blocks unexpected URLs usually matched a broader pattern than intended. Confirming which URLs a crawler still fetches reveals whether your pattern is correct.
Diagnostic use case
Write precise Allow/Disallow patterns — for query strings, file extensions, or path segments — without accidentally blocking more than intended.
What WebmasterID can help detect
WebmasterID shows which paths crawlers fetch, so after a wildcard change you can confirm the intended URLs are affected and no others.
Common mistakes
- Writing Disallow: /news and accidentally blocking /newsletter.
- Forgetting $ so a file-extension rule matches more than intended.
- Assuming every crawler supports * and $ — major ones do, not all.
Privacy and accuracy notes
Path patterns are public configuration. Do not use them to 'hide' sensitive paths — listing them only advertises their existence.
Related pages
- User-agent groups and matching in robots.txt
robots.txt rules are organised into user-agent groups. A crawler does not combine every group — it selects the single most specific group whose token matches its name, falling back to the * group only when no named group matches. Understanding this prevents rules that never apply.
- robots.txt basics: what it does and what it cannot do
robots.txt is a plain-text file at your site root that tells compliant crawlers which paths they may request. This page covers the directives, how user-agent groups are matched, and the limits that trip people up: robots.txt is advisory, it does not hide pages from search, and it is not a security boundary.
- Website observability
Confirm a wildcard rule affects the URLs you intended.
Sources and verification notes
- Google — How Google interprets robots.txtDocuments * and $ wildcard support and longest-match precedence.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.