Allow only specific bots, block the rest
Sometimes you want only a few named crawlers to access your site and everyone else kept out. Because each crawler obeys only its single most specific matching group, you build this by giving the allowed crawlers their own permissive groups and putting a blanket Disallow in the * group — with important caveats.
The pattern
Because a crawler applies only its single most specific matching group, you allow chosen crawlers by giving each its own group, then disallow everyone else in the * group:
User-agent: Googlebot Disallow:
User-agent: bingbot Disallow:
User-agent: * Disallow: /
Googlebot and bingbot match their own groups (empty Disallow = allowed everywhere), while every other crawler falls back to the * group and is disallowed.
Caveats and ordering
Two things matter. First, you must list each allowed crawler's exact token; a crawler with no named group falls into the * block. Second, do not assume rules merge — the allowed crawlers do not inherit the * group's Disallow, which is exactly what makes this work.
This only governs compliant crawlers: anything that ignores robots.txt will not be stopped. And blocking the * group affects every other crawler, including SEO and AI tools, so confirm you really want that breadth.
- Each allowed crawler needs its own named group
- Named groups replace, not merge with, the * group
- Only compliant crawlers honour the rules
How it appears in analytics and logs
If a crawler you meant to allow is still blocked — or one you meant to block still crawls — the wrong group matched. Each crawler applies only its most specific group, never a merge.
Diagnostic use case
Permit one or two named crawlers (for example a search engine you care about) while asking all other compliant crawlers to stay off the site.
What WebmasterID can help detect
WebmasterID shows which crawlers actually reach your pages, so you can confirm only the tokens you intended to allow are still fetching content.
Common mistakes
- Forgetting that a named group replaces the * group, so allowed crawlers are not also blocked.
- Misspelling an allowed crawler's token, dropping it into the * Disallow.
- Assuming the * Disallow stops non-compliant clients — it cannot.
Privacy and accuracy notes
Selective allow/deny is a public configuration choice. Non-compliant clients ignore it, so it is not access control; private content needs authentication.
Related pages
- User-agent groups and matching in robots.txt
robots.txt rules are organised into user-agent groups. A crawler does not combine every group — it selects the single most specific group whose token matches its name, falling back to the * group only when no named group matches. Understanding this prevents rules that never apply.
- How to block all bots in robots.txt
A single robots.txt group can ask every compliant crawler to stay off your whole site. This page gives the exact rule and is blunt about the caveats: robots.txt is advisory rather than enforced, blocking search crawlers can remove you from results, and it is not a security boundary.
- Multiple user-agent groups and precedence
A robots.txt file usually has several user-agent groups. A crawler does not combine them: it selects the one most specific group whose token matches its name, per RFC 9309. This page explains how that precedence works, how multiple User-agent lines share one group, and the merging rules that surprise people.
- Bot intelligence
Confirm only the crawlers you allowed still reach your site.
Sources and verification notes
- RFC 9309 — Robots Exclusion ProtocolDefines most-specific-group matching, not a merge across groups.
- Google — How Google interprets robots.txt
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.