WebmasterID logoWebmasterID
Crawl diagnostics

Fixing 'Indexed, though blocked by robots.txt'

A URL disallowed in robots.txt can still appear in Google's index if other pages link to it — Google may index the URL (often with no useful snippet) without crawling it. The trap is that a noindex tag on that page cannot be seen, because robots.txt stops Google fetching the page to read the tag. The fix is to allow crawling and use noindex, or to remove the link signals.

Verified against primary sources

Why a blocked page gets indexed

robots.txt governs crawling — whether Google may fetch a URL — not indexing, whether the URL may appear in results. If other pages link to a disallowed URL, Google can learn the URL exists and index it based on those external signals without ever fetching its content. Such results often show a sparse listing with no description, because Google never read the page.

Google reports this as 'Indexed, though blocked by robots.txt' in the Page Indexing report. It is a warning that your robots block is not achieving what you likely intended.

Why noindex does not help here

The instinctive fix — add a noindex meta tag or X-Robots-Tag — fails when the page is also blocked in robots.txt. Google cannot fetch the page, so it cannot see the noindex directive, and the URL stays indexed. The two directives conflict: a robots Disallow hides the very tag meant to remove the page.

This is the core trap. To remove a page from the index with noindex, Google must be allowed to crawl it. Blocking and noindexing the same URL is self-defeating.

The correct fixes

Choose based on intent. To remove the page from the index: allow crawling of that URL in robots.txt and add a noindex directive (meta robots or X-Robots-Tag); Google will then crawl it, read the noindex, and drop it. For urgent removal, also use the Search Console Removals tool as a temporary measure while the noindex propagates.

If the page must stay uncrawlable and you only want it out of results, reduce the link signals pointing to it so Google is less likely to index it — though this is less reliable than noindex. To keep a page out of the index entirely, the durable answer is almost always: allow crawl plus noindex, not robots.txt Disallow.

How it appears in analytics and logs

This status means Google indexed a URL it was not allowed to crawl, usually because links pointed to it. robots.txt controls crawling, not indexing — so blocking a page does not reliably keep it out of the index, and any noindex on it goes unseen.

Diagnostic use case

Resolve the 'Indexed, though blocked by robots.txt' status correctly: unblock the URL so a noindex can be read, rather than relying on a robots Disallow to keep it out.

What WebmasterID can help detect

WebmasterID records whether crawlers actually fetched a URL server-side, helping confirm that a page is robots-blocked (not being fetched) even though it appears in the index, so you apply the right fix.

Common mistakes

Privacy and accuracy notes

This diagnosis uses crawl directives and indexing state, not visitor data. WebmasterID records crawler fetches without attaching them to any person.

Frequently asked questions

Why is my robots.txt-blocked page still showing in Google?
robots.txt blocks crawling, not indexing. If other pages link to the URL, Google can index it from those links without fetching it. To remove it, allow crawling and add a noindex directive so Google can read and honour it.
Can I just add noindex to the blocked page?
Not while it is blocked. Google must crawl the page to see the noindex, and robots.txt prevents that. Unblock the URL, add noindex, and let Google recrawl it to drop the page from the index.

Related pages

Sources and verification notes

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.