WebmasterID logoWebmasterID
Robots & crawl control

robots.txt common mistakes

Most robots.txt problems come from a handful of recurring mistakes. This page collects the big ones — blocking the CSS and JS crawlers need to render, trying to deindex with Disallow, advertising secret paths, and treating an advisory file as enforcement — with the correct approach for each.

Verified against primary sources

The high-impact mistakes

A few errors cause most of the damage. Blocking CSS or JavaScript that crawlers need to render the page can make Google see a broken layout and misjudge the content. Using Disallow to remove a page from search does not work — a blocked-but-linked URL can still appear, without a snippet; use noindex on a crawlable page instead.

Listing sensitive paths in robots.txt to keep them private does the opposite: the file is public, so you advertise exactly what you meant to hide.

More pitfalls and the fixes

Other frequent issues: assuming robots.txt enforces anything (it is advisory — non-compliant clients ignore it); putting rules only in the * group and expecting named crawlers to obey them; over-broad prefix patterns (Disallow: /news also blocks /newsletter); and serving a 5xx for robots.txt, which can make crawlers back off entirely.

Fixes: keep CSS/JS crawlable; deindex with noindex on crawlable pages; protect private content with authentication; repeat rules in each named group that needs them; anchor patterns with a trailing slash or $; and ensure robots.txt returns a clean 200.

How it appears in analytics and logs

Symptoms map to causes: pages rendering poorly in search often mean blocked CSS/JS; a 'blocked' page still in results means Disallow was used instead of noindex.

Diagnostic use case

Audit an existing robots.txt against the most common, highest-impact errors before they cost you rendering, indexing, or privacy.

What WebmasterID can help detect

WebmasterID shows which paths crawlers fetch and which they are blocked from, helping you catch these mistakes by observing real crawl behaviour.

Common mistakes

Privacy and accuracy notes

robots.txt is public. Listing secret paths to 'hide' them advertises them. Private content needs authentication, never a Disallow line.

Related pages

Sources and verification notes

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.