robots.txt vs the meta robots tag
robots.txt and the meta robots tag solve different problems. robots.txt asks crawlers not to fetch a path; the meta robots tag, embedded in a page's HTML, tells search engines whether to index it. The classic mistake is using Disallow to remove a page from search — which can backfire.
Two different controls
robots.txt operates at the crawl layer: Disallow asks a crawler not to request a URL at all. The meta robots tag operates at the index layer: placed in the page's <head>, it tells search engines whether to index the page and follow its links, for example:
<meta name="robots" content="noindex">
Because they act at different layers, they are not interchangeable.
Why Disallow does not deindex
If you Disallow a URL, compliant crawlers never fetch it — which means they never see a noindex tag inside it. A disallowed URL that is linked from other sites can still appear in search results, often without a snippet, precisely because the crawler was told not to read the page.
To remove a page from search, do the opposite: allow crawling so the crawler can fetch the page, and serve a noindex signal. Once it has been recrawled and dropped, you can disallow it again if you wish.
- robots.txt Disallow = do not crawl
- meta robots noindex = do not index
- A disallowed page's noindex is never seen
How it appears in analytics and logs
A page can be disallowed yet still indexed (if linked elsewhere), or crawlable yet noindexed. Seeing a 'blocked' URL in search usually means Disallow was used where noindex was needed.
Diagnostic use case
Choose the right tool: block crawling with robots.txt, or keep a page out of search with a noindex meta tag — and know why you usually cannot do both.
What WebmasterID can help detect
WebmasterID shows whether crawlers are still fetching a page, which helps you reason about whether a noindex tag can even be seen — a disallowed page's noindex is never read.
Common mistakes
- Using Disallow to remove a page from search instead of noindex.
- Disallowing a page and a noindex tag together, so the noindex is never read.
- Assuming a meta robots tag also stops crawling — it does not.
Privacy and accuracy notes
Neither tag hides content from people. Both are public signals. Truly private content needs authentication.
Related pages
- The noindex meta tag
The noindex value of the meta robots tag tells search engines to keep a page out of their index. The catch trips people up constantly: for noindex to work, the crawler must be able to fetch the page — so you must not block the same URL in robots.txt.
- robots.txt vs the X-Robots-Tag header
X-Robots-Tag carries the same indexing directives as the meta robots tag, but in the HTTP response header instead of the HTML body. That makes it the way to apply noindex or nofollow to non-HTML resources like PDFs and images, where a meta tag has nowhere to live.
- Website observability
Check whether crawlers still fetch a page you want deindexed.
Sources and verification notes
- Google — Block search indexing with noindexExplains noindex and why disallowed pages cannot be deindexed by robots.txt.
- Google — Introduction to robots.txt
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.