Robots & crawl control

robots.txt comments and encoding

robots.txt supports comments with the hash character and is parsed as a UTF-8 plain-text file. Getting the encoding wrong — a stray byte order mark, a non-UTF-8 charset, or comments placed where a directive is expected — can cause crawlers to misread or ignore rules. This page covers comment syntax and the encoding requirements that keep a file valid.

Verified against primary sources

Comment syntax

A comment starts with a hash character and runs to the end of the line. Crawlers ignore everything from the hash onward, so you can document rules inline or on their own line.

# Block our staging crawler User-agent: * Disallow: /staging/ # not for indexing

Keep comments on their own lines or trailing a directive. Do not split a directive across a comment, and remember a comment is purely for humans — it never changes how a rule is applied.

Comments begin with # and run to end of line
They can be on their own line or trail a directive
Comments never affect parsing of the rules themselves

Encoding requirements

Google's specification states robots.txt must be a UTF-8 encoded text file, and crawlers may ignore characters that are not part of UTF-8. A common trap is a UTF-8 byte order mark (BOM) saved at the start of the file by some editors: an unaware parser can treat the BOM as part of the first line, breaking the first directive.

Google's parser specifically tolerates a leading BOM, but not every crawler does, so the safe practice is to save the file as UTF-8 without a BOM. Use Unix line endings, avoid smart-quote substitution from word processors, and serve the file with a text/plain content type.

robots.txt must be UTF-8
Save without a byte order mark for broad compatibility
Serve as text/plain with plain straight characters, not smart quotes

How it appears in analytics and logs

If rules that look correct are being ignored, the cause is often an encoding or comment-placement problem rather than the rule logic. It is a parsing signal, not a sign of crawler misbehaviour.

Diagnostic use case

Annotate a robots.txt file safely and avoid encoding pitfalls — like a UTF-8 BOM — that can make a crawler skip the first directive.

What WebmasterID can help detect

WebmasterID records crawler fetches of your robots.txt, so if a malformed or wrongly encoded file is causing crawlers to behave unexpectedly, you can see the fetch pattern alongside the bot activity it governs.

Common mistakes

Saving robots.txt with a UTF-8 BOM that breaks the first directive for strict parsers.
Pasting from a word processor and introducing smart quotes or non-UTF-8 characters.
Assuming a comment can disable part of a directive — it only ends the line.

Privacy and accuracy notes

Comments and encoding concern the file's text only and never involve visitor data. WebmasterID treats robots.txt fetches by crawlers as bot events, separate from human analytics.

↑ All robots topics in Robots & crawl control

Sources and verification notes

Google — robots.txt specification (file format and encoding)States UTF-8 requirement, BOM handling, and comment syntax.
Robots Exclusion Protocol (RFC 9309) — file formatDefines comments and UTF-8 file encoding.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.