Fixing 'Indexed, though blocked by robots.txt'
A URL disallowed in robots.txt can still appear in Google's index if other pages link to it — Google may index the URL (often with no useful snippet) without crawling it. The trap is that a noindex tag on that page cannot be seen, because robots.txt stops Google fetching the page to read the tag. The fix is to allow crawling and use noindex, or to remove the link signals.
Why a blocked page gets indexed
robots.txt governs crawling — whether Google may fetch a URL — not indexing, whether the URL may appear in results. If other pages link to a disallowed URL, Google can learn the URL exists and index it based on those external signals without ever fetching its content. Such results often show a sparse listing with no description, because Google never read the page.
Google reports this as 'Indexed, though blocked by robots.txt' in the Page Indexing report. It is a warning that your robots block is not achieving what you likely intended.
- robots.txt controls crawling, not indexing
- Links to a blocked URL can cause it to be indexed without being crawled
- The result often has no snippet because Google never read the page
Why noindex does not help here
The instinctive fix — add a noindex meta tag or X-Robots-Tag — fails when the page is also blocked in robots.txt. Google cannot fetch the page, so it cannot see the noindex directive, and the URL stays indexed. The two directives conflict: a robots Disallow hides the very tag meant to remove the page.
This is the core trap. To remove a page from the index with noindex, Google must be allowed to crawl it. Blocking and noindexing the same URL is self-defeating.
The correct fixes
Choose based on intent. To remove the page from the index: allow crawling of that URL in robots.txt and add a noindex directive (meta robots or X-Robots-Tag); Google will then crawl it, read the noindex, and drop it. For urgent removal, also use the Search Console Removals tool as a temporary measure while the noindex propagates.
If the page must stay uncrawlable and you only want it out of results, reduce the link signals pointing to it so Google is less likely to index it — though this is less reliable than noindex. To keep a page out of the index entirely, the durable answer is almost always: allow crawl plus noindex, not robots.txt Disallow.
How it appears in analytics and logs
This status means Google indexed a URL it was not allowed to crawl, usually because links pointed to it. robots.txt controls crawling, not indexing — so blocking a page does not reliably keep it out of the index, and any noindex on it goes unseen.
Diagnostic use case
Resolve the 'Indexed, though blocked by robots.txt' status correctly: unblock the URL so a noindex can be read, rather than relying on a robots Disallow to keep it out.
What WebmasterID can help detect
WebmasterID records whether crawlers actually fetched a URL server-side, helping confirm that a page is robots-blocked (not being fetched) even though it appears in the index, so you apply the right fix.
Common mistakes
- Adding noindex to a page that is also robots.txt-blocked, so the tag is never seen.
- Expecting robots.txt Disallow to keep a linked URL out of the index.
- Forgetting that blocked-but-indexed results show without a useful snippet.
- Skipping the Removals tool for urgent cases while noindex propagates.
Privacy and accuracy notes
This diagnosis uses crawl directives and indexing state, not visitor data. WebmasterID records crawler fetches without attaching them to any person.
Frequently asked questions
- Why is my robots.txt-blocked page still showing in Google?
- robots.txt blocks crawling, not indexing. If other pages link to the URL, Google can index it from those links without fetching it. To remove it, allow crawling and add a noindex directive so Google can read and honour it.
- Can I just add noindex to the blocked page?
- Not while it is blocked. Google must crawl the page to see the noindex, and robots.txt prevents that. Unblock the URL, add noindex, and let Google recrawl it to drop the page from the index.
Related pages
- Noindex but heavily linked: a diagnosis
A noindex page that is still prominently linked across the site is a common, subtle conflict: you are telling search engines not to index a page while structurally treating it as important. Either the noindex is a mistake on a page you want indexed, or the heavy linking wastes internal link equity on a page you have chosen to keep out of the index. Diagnosis is about resolving the contradiction.
- Diagnosing a blocked crawler
When a crawler is not reaching your pages, the block can come from several layers: a robots.txt Disallow, a server-side 403, a WAF or bot-management rule, or an IP filter. Confirming which layer is responsible — rather than guessing — is the key to fixing it without opening doors you meant to keep shut.
- Reading the Page Indexing (Coverage) report
The Page Indexing report (formerly Index Coverage) in Google Search Console shows how many of your pages are indexed and groups the not-indexed pages by reason — such as crawled-not-indexed, discovered-not-indexed, duplicate without user-selected canonical, excluded by noindex, blocked by robots.txt, redirect, or soft 404. Each reason points to a distinct fix.
- Website observability
Confirm a URL is robots-blocked (not fetched) yet indexed, recorded server-side.
Sources and verification notes
- Google Search Central — robots.txt introduction and limits
- Google Search Central — Block search indexing with noindex
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.