Robots & crawl control

robots.txt size limits and parsing

robots.txt files are not unlimited. Google documents a maximum parsed size of 500 KiB and ignores anything beyond it, which can silently drop rules at the bottom of a bloated file. This page covers the size limit and how parsing precedence — most specific rule wins — interacts with it.

Verified against primary sources

The size limit

Google documents that it enforces a maximum robots.txt file size of 500 kibibytes (KiB) and that content beyond that limit is ignored. A file padded with thousands of rules, comments, or generated lines can therefore have its trailing rules silently dropped.

Keep robots.txt lean: consolidate patterns with wildcards where supported, remove dead rules, and avoid machine-generating an enormous file when a few patterns would do.

Google parses up to 500 KiB; beyond that is ignored
Trailing rules in an oversized file can be dropped
Consolidate with patterns rather than listing every URL

Parsing precedence

When multiple rules match a URL, Google applies the most specific (longest path match) rule between Allow and Disallow, not simply the first listed. Order alone does not decide the outcome — specificity does — though a rule that never gets parsed because it is beyond the size limit cannot win at all.

Because crawlers may differ in edge-case parsing, test the rules that matter rather than assuming identical behaviour everywhere.

Most specific (longest match) wins between Allow and Disallow
Specificity, not file order, decides the match
A rule beyond the size limit is never evaluated

How it appears in analytics and logs

If a rule near the end of a very large robots.txt seems ignored, it may fall beyond the parsed size limit. Confirming which URLs crawlers still fetch reveals whether your rules are actually taking effect.

Diagnostic use case

Keep robots.txt within the parsed size limit and understand which rule applies when several match, so no important rule is silently ignored.

What WebmasterID can help detect

WebmasterID shows which paths crawlers fetch, so you can detect when expected rules are not being applied — for instance because they sit past the size limit.

Common mistakes

Letting a generated robots.txt exceed 500 KiB and silently dropping rules.
Assuming the first matching rule wins rather than the most specific.
Padding the file with redundant per-URL rules instead of patterns.

Privacy and accuracy notes

robots.txt size and parsing are public configuration details. They involve no visitor data.

↑ All robots topics in Robots & crawl control

Sources and verification notes

Google — How Google interprets robots.txtDocuments the 500 KiB size limit and most-specific-rule precedence.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.