robots.txt common mistakes
Most robots.txt problems come from a handful of recurring mistakes. This page collects the big ones — blocking the CSS and JS crawlers need to render, trying to deindex with Disallow, advertising secret paths, and treating an advisory file as enforcement — with the correct approach for each.
The high-impact mistakes
A few errors cause most of the damage. Blocking CSS or JavaScript that crawlers need to render the page can make Google see a broken layout and misjudge the content. Using Disallow to remove a page from search does not work — a blocked-but-linked URL can still appear, without a snippet; use noindex on a crawlable page instead.
Listing sensitive paths in robots.txt to keep them private does the opposite: the file is public, so you advertise exactly what you meant to hide.
- Blocking CSS/JS needed for rendering
- Using Disallow to deindex instead of noindex
- Listing secret paths in a public file
More pitfalls and the fixes
Other frequent issues: assuming robots.txt enforces anything (it is advisory — non-compliant clients ignore it); putting rules only in the * group and expecting named crawlers to obey them; over-broad prefix patterns (Disallow: /news also blocks /newsletter); and serving a 5xx for robots.txt, which can make crawlers back off entirely.
Fixes: keep CSS/JS crawlable; deindex with noindex on crawlable pages; protect private content with authentication; repeat rules in each named group that needs them; anchor patterns with a trailing slash or $; and ensure robots.txt returns a clean 200.
- Repeat rules in named groups — they do not inherit from *
- Anchor patterns to avoid over-matching
- Serve robots.txt as a clean 200, not a 5xx
How it appears in analytics and logs
Symptoms map to causes: pages rendering poorly in search often mean blocked CSS/JS; a 'blocked' page still in results means Disallow was used instead of noindex.
Diagnostic use case
Audit an existing robots.txt against the most common, highest-impact errors before they cost you rendering, indexing, or privacy.
What WebmasterID can help detect
WebmasterID shows which paths crawlers fetch and which they are blocked from, helping you catch these mistakes by observing real crawl behaviour.
Common mistakes
- Blocking CSS/JS and harming how search engines render your pages.
- Using Disallow to deindex instead of a noindex on a crawlable page.
- Listing secret paths in a public robots.txt.
- Expecting an advisory file to enforce anything against non-compliant clients.
Privacy and accuracy notes
robots.txt is public. Listing secret paths to 'hide' them advertises them. Private content needs authentication, never a Disallow line.
Related pages
- robots.txt basics: what it does and what it cannot do
robots.txt is a plain-text file at your site root that tells compliant crawlers which paths they may request. This page covers the directives, how user-agent groups are matched, and the limits that trip people up: robots.txt is advisory, it does not hide pages from search, and it is not a security boundary.
- robots.txt vs the meta robots tag
robots.txt and the meta robots tag solve different problems. robots.txt asks crawlers not to fetch a path; the meta robots tag, embedded in a page's HTML, tells search engines whether to index it. The classic mistake is using Disallow to remove a page from search — which can backfire.
- How to test your robots.txt
A robots.txt rule is only useful if it does what you think. This page covers how to test it — checking the live file, using Google Search Console's robots.txt report and URL Inspection, and confirming in your own logs that the intended crawlers are or are not fetching the affected URLs.
- Website observability
Catch robots.txt mistakes by observing real crawl behaviour.
Sources and verification notes
- Google — How Google interprets robots.txt
- Google — Block search indexing with noindexExplains why Disallow does not deindex.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.