WebmasterID logoWebmasterID
Robots & crawl control

robots.txt comments and encoding

robots.txt supports comments with the hash character and is parsed as a UTF-8 plain-text file. Getting the encoding wrong — a stray byte order mark, a non-UTF-8 charset, or comments placed where a directive is expected — can cause crawlers to misread or ignore rules. This page covers comment syntax and the encoding requirements that keep a file valid.

Verified against primary sources

Comment syntax

A comment starts with a hash character and runs to the end of the line. Crawlers ignore everything from the hash onward, so you can document rules inline or on their own line.

# Block our staging crawler User-agent: * Disallow: /staging/ # not for indexing

Keep comments on their own lines or trailing a directive. Do not split a directive across a comment, and remember a comment is purely for humans — it never changes how a rule is applied.

Encoding requirements

Google's specification states robots.txt must be a UTF-8 encoded text file, and crawlers may ignore characters that are not part of UTF-8. A common trap is a UTF-8 byte order mark (BOM) saved at the start of the file by some editors: an unaware parser can treat the BOM as part of the first line, breaking the first directive.

Google's parser specifically tolerates a leading BOM, but not every crawler does, so the safe practice is to save the file as UTF-8 without a BOM. Use Unix line endings, avoid smart-quote substitution from word processors, and serve the file with a text/plain content type.

How it appears in analytics and logs

If rules that look correct are being ignored, the cause is often an encoding or comment-placement problem rather than the rule logic. It is a parsing signal, not a sign of crawler misbehaviour.

Diagnostic use case

Annotate a robots.txt file safely and avoid encoding pitfalls — like a UTF-8 BOM — that can make a crawler skip the first directive.

What WebmasterID can help detect

WebmasterID records crawler fetches of your robots.txt, so if a malformed or wrongly encoded file is causing crawlers to behave unexpectedly, you can see the fetch pattern alongside the bot activity it governs.

Common mistakes

Privacy and accuracy notes

Comments and encoding concern the file's text only and never involve visitor data. WebmasterID treats robots.txt fetches by crawlers as bot events, separate from human analytics.

Related pages

Sources and verification notes

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.