Robots & crawl control

Allow only specific bots, block the rest

Sometimes you want only a few named crawlers to access your site and everyone else kept out. Because each crawler obeys only its single most specific matching group, you build this by giving the allowed crawlers their own permissive groups and putting a blanket Disallow in the * group — with important caveats.

Verified against primary sources

The pattern

Because a crawler applies only its single most specific matching group, you allow chosen crawlers by giving each its own group, then disallow everyone else in the * group:

User-agent: Googlebot Disallow:

User-agent: bingbot Disallow:

User-agent: * Disallow: /

Googlebot and bingbot match their own groups (empty Disallow = allowed everywhere), while every other crawler falls back to the * group and is disallowed.

Caveats and ordering

Two things matter. First, you must list each allowed crawler's exact token; a crawler with no named group falls into the * block. Second, do not assume rules merge — the allowed crawlers do not inherit the * group's Disallow, which is exactly what makes this work.

This only governs compliant crawlers: anything that ignores robots.txt will not be stopped. And blocking the * group affects every other crawler, including SEO and AI tools, so confirm you really want that breadth.

Each allowed crawler needs its own named group
Named groups replace, not merge with, the * group
Only compliant crawlers honour the rules

How it appears in analytics and logs

If a crawler you meant to allow is still blocked — or one you meant to block still crawls — the wrong group matched. Each crawler applies only its most specific group, never a merge.

Diagnostic use case

Permit one or two named crawlers (for example a search engine you care about) while asking all other compliant crawlers to stay off the site.

What WebmasterID can help detect

WebmasterID shows which crawlers actually reach your pages, so you can confirm only the tokens you intended to allow are still fetching content.

Common mistakes

Forgetting that a named group replaces the * group, so allowed crawlers are not also blocked.
Misspelling an allowed crawler's token, dropping it into the * Disallow.
Assuming the * Disallow stops non-compliant clients — it cannot.

Privacy and accuracy notes

Selective allow/deny is a public configuration choice. Non-compliant clients ignore it, so it is not access control; private content needs authentication.

↑ All robots topics in Robots & crawl control

Sources and verification notes

RFC 9309 — Robots Exclusion ProtocolDefines most-specific-group matching, not a merge across groups.
Google — How Google interprets robots.txt

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.