User-agent groups and matching in robots.txt
robots.txt rules are organised into user-agent groups. A crawler does not combine every group — it selects the single most specific group whose token matches its name, falling back to the * group only when no named group matches. Understanding this prevents rules that never apply.
How matching works
Each group begins with one or more User-agent lines and is followed by Allow/Disallow rules. When a crawler reads robots.txt, it looks for the group whose user-agent token most specifically matches its own name. Per RFC 9309 and Google's documentation, the crawler applies only that single best-matching group — it does not merge rules across groups.
The * token is the default group: it applies only to crawlers that have no more specific group of their own. So if you have both a Googlebot group and a * group, Googlebot follows the Googlebot group and ignores the * group entirely.
- Most specific matching token wins
- * is the fallback, used only when no named group matches
- A crawler applies one group, not a merge of several
A common pitfall
Because a named group fully replaces the * group for that crawler, rules you put only in * will not apply to a crawler that has its own group. For example, if your * group has Disallow: /private/ but you also add an empty Googlebot group, Googlebot may no longer be subject to that Disallow. Repeat the rules you need inside each named group rather than assuming they inherit.
How it appears in analytics and logs
If a crawler ignores a rule you expected, often the wrong group matched: a more specific group for that crawler took precedence over your * group, or vice versa.
Diagnostic use case
Structure robots.txt so each crawler gets the rules you intend, and avoid the trap of a * group silently overriding or being overridden.
What WebmasterID can help detect
WebmasterID shows which crawlers reach which paths, helping you confirm that the group you intended for a crawler is the one actually governing its behaviour.
Common mistakes
- Assuming a named crawler also obeys the * group's rules — it does not.
- Adding an empty named group that accidentally exempts a crawler from * rules.
- Expecting rules from multiple matching groups to be merged.
Privacy and accuracy notes
User-agent grouping is a public configuration choice. It involves no visitor data.
Related pages
- Wildcards and path matching in robots.txt
Although the original protocol used simple prefix matching, major crawlers support two wildcards in path rules: * matches any sequence of characters, and $ anchors the end of the URL. This page covers how they behave, useful patterns, and the mistakes that make a rule too broad.
- robots.txt basics: what it does and what it cannot do
robots.txt is a plain-text file at your site root that tells compliant crawlers which paths they may request. This page covers the directives, how user-agent groups are matched, and the limits that trip people up: robots.txt is advisory, it does not hide pages from search, and it is not a security boundary.
- Bot intelligence
Confirm which crawlers your groups actually govern.
Sources and verification notes
- RFC 9309 — Robots Exclusion ProtocolDefines most-specific-group matching.
- Google — How Google interprets robots.txt
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.