Robots & crawl control

Wildcards and path matching in robots.txt

Although the original protocol used simple prefix matching, major crawlers support two wildcards in path rules: * matches any sequence of characters, and $ anchors the end of the URL. This page covers how they behave, useful patterns, and the mistakes that make a rule too broad.

Verified against primary sources

How the wildcards work

Google documents that Googlebot and other major crawlers support two special characters in robots.txt path values:

* matches zero or more of any character, anywhere in the path. $ matches the end of the URL.

For example, Disallow: /*.pdf$ blocks URLs ending in .pdf, while Disallow: /search? blocks paths beginning with /search?. Without $, a pattern matches as a prefix, so Disallow: /private blocks /private, /private/, and /privatedata alike.

* — matches any sequence of characters
$ — anchors the match to the end of the URL
No trailing $ means prefix matching

Patterns and pitfalls

Useful patterns include blocking query parameters (Disallow: /*?sort=), blocking a file type (Disallow: /*.json$), and carving out exceptions with Allow plus a more specific pattern. Remember that for a given URL the most specific (longest) matching rule wins between Allow and Disallow.

The common pitfall is over-matching: a bare Disallow: /news catches /newsletter too, because it is a prefix match. Anchor with $ or add a trailing slash (Disallow: /news/) when you mean a specific segment. Support for these wildcards is broad among major crawlers but not universal, so do not assume every minor crawler implements them.

Prefix matching over-catches — /news also blocks /newsletter
Use $ or a trailing slash to scope a rule
Longest matching rule wins between Allow and Disallow

How it appears in analytics and logs

A wildcard rule that blocks unexpected URLs usually matched a broader pattern than intended. Confirming which URLs a crawler still fetches reveals whether your pattern is correct.

Diagnostic use case

Write precise Allow/Disallow patterns — for query strings, file extensions, or path segments — without accidentally blocking more than intended.

What WebmasterID can help detect

WebmasterID shows which paths crawlers fetch, so after a wildcard change you can confirm the intended URLs are affected and no others.

Common mistakes

Writing Disallow: /news and accidentally blocking /newsletter.
Forgetting $ so a file-extension rule matches more than intended.
Assuming every crawler supports * and $ — major ones do, not all.

Privacy and accuracy notes

Path patterns are public configuration. Do not use them to 'hide' sensitive paths — listing them only advertises their existence.

↑ All robots topics in Robots & crawl control

Sources and verification notes

Google — How Google interprets robots.txtDocuments * and $ wildcard support and longest-match precedence.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.