Robots & crawl control

robots.txt vs the meta robots tag

robots.txt and the meta robots tag solve different problems. robots.txt asks crawlers not to fetch a path; the meta robots tag, embedded in a page's HTML, tells search engines whether to index it. The classic mistake is using Disallow to remove a page from search — which can backfire.

Verified against primary sources

Two different controls

robots.txt operates at the crawl layer: Disallow asks a crawler not to request a URL at all. The meta robots tag operates at the index layer: placed in the page's <head>, it tells search engines whether to index the page and follow its links, for example:

Because they act at different layers, they are not interchangeable.

Why Disallow does not deindex

If you Disallow a URL, compliant crawlers never fetch it — which means they never see a noindex tag inside it. A disallowed URL that is linked from other sites can still appear in search results, often without a snippet, precisely because the crawler was told not to read the page.

To remove a page from search, do the opposite: allow crawling so the crawler can fetch the page, and serve a noindex signal. Once it has been recrawled and dropped, you can disallow it again if you wish.

robots.txt Disallow = do not crawl
meta robots noindex = do not index
A disallowed page's noindex is never seen

How it appears in analytics and logs

A page can be disallowed yet still indexed (if linked elsewhere), or crawlable yet noindexed. Seeing a 'blocked' URL in search usually means Disallow was used where noindex was needed.

Diagnostic use case

Choose the right tool: block crawling with robots.txt, or keep a page out of search with a noindex meta tag — and know why you usually cannot do both.

What WebmasterID can help detect

WebmasterID shows whether crawlers are still fetching a page, which helps you reason about whether a noindex tag can even be seen — a disallowed page's noindex is never read.

Common mistakes

Using Disallow to remove a page from search instead of noindex.
Disallowing a page and a noindex tag together, so the noindex is never read.
Assuming a meta robots tag also stops crawling — it does not.

Privacy and accuracy notes

Neither tag hides content from people. Both are public signals. Truly private content needs authentication.

↑ All robots topics in Robots & crawl control

Sources and verification notes

Google — Block search indexing with noindexExplains noindex and why disallowed pages cannot be deindexed by robots.txt.
Google — Introduction to robots.txt

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.