Robots & crawl control

robots.txt common mistakes

Most robots.txt problems come from a handful of recurring mistakes. This page collects the big ones — blocking the CSS and JS crawlers need to render, trying to deindex with Disallow, advertising secret paths, and treating an advisory file as enforcement — with the correct approach for each.

Verified against primary sources

The high-impact mistakes

A few errors cause most of the damage. Blocking CSS or JavaScript that crawlers need to render the page can make Google see a broken layout and misjudge the content. Using Disallow to remove a page from search does not work — a blocked-but-linked URL can still appear, without a snippet; use noindex on a crawlable page instead.

Listing sensitive paths in robots.txt to keep them private does the opposite: the file is public, so you advertise exactly what you meant to hide.

Blocking CSS/JS needed for rendering
Using Disallow to deindex instead of noindex
Listing secret paths in a public file

More pitfalls and the fixes

Other frequent issues: assuming robots.txt enforces anything (it is advisory — non-compliant clients ignore it); putting rules only in the * group and expecting named crawlers to obey them; over-broad prefix patterns (Disallow: /news also blocks /newsletter); and serving a 5xx for robots.txt, which can make crawlers back off entirely.

Fixes: keep CSS/JS crawlable; deindex with noindex on crawlable pages; protect private content with authentication; repeat rules in each named group that needs them; anchor patterns with a trailing slash or $; and ensure robots.txt returns a clean 200.

Repeat rules in named groups — they do not inherit from *
Anchor patterns to avoid over-matching
Serve robots.txt as a clean 200, not a 5xx

How it appears in analytics and logs

Symptoms map to causes: pages rendering poorly in search often mean blocked CSS/JS; a 'blocked' page still in results means Disallow was used instead of noindex.

Diagnostic use case

Audit an existing robots.txt against the most common, highest-impact errors before they cost you rendering, indexing, or privacy.

What WebmasterID can help detect

WebmasterID shows which paths crawlers fetch and which they are blocked from, helping you catch these mistakes by observing real crawl behaviour.

Common mistakes

Blocking CSS/JS and harming how search engines render your pages.
Using Disallow to deindex instead of a noindex on a crawlable page.
Listing secret paths in a public robots.txt.
Expecting an advisory file to enforce anything against non-compliant clients.

Privacy and accuracy notes

robots.txt is public. Listing secret paths to 'hide' them advertises them. Private content needs authentication, never a Disallow line.

↑ All robots topics in Robots & crawl control

Sources and verification notes

Google — How Google interprets robots.txt
Google — Block search indexing with noindexExplains why Disallow does not deindex.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.