robots.txt size limits and parsing
robots.txt files are not unlimited. Google documents a maximum parsed size of 500 KiB and ignores anything beyond it, which can silently drop rules at the bottom of a bloated file. This page covers the size limit and how parsing precedence — most specific rule wins — interacts with it.
The size limit
Google documents that it enforces a maximum robots.txt file size of 500 kibibytes (KiB) and that content beyond that limit is ignored. A file padded with thousands of rules, comments, or generated lines can therefore have its trailing rules silently dropped.
Keep robots.txt lean: consolidate patterns with wildcards where supported, remove dead rules, and avoid machine-generating an enormous file when a few patterns would do.
- Google parses up to 500 KiB; beyond that is ignored
- Trailing rules in an oversized file can be dropped
- Consolidate with patterns rather than listing every URL
Parsing precedence
When multiple rules match a URL, Google applies the most specific (longest path match) rule between Allow and Disallow, not simply the first listed. Order alone does not decide the outcome — specificity does — though a rule that never gets parsed because it is beyond the size limit cannot win at all.
Because crawlers may differ in edge-case parsing, test the rules that matter rather than assuming identical behaviour everywhere.
- Most specific (longest match) wins between Allow and Disallow
- Specificity, not file order, decides the match
- A rule beyond the size limit is never evaluated
How it appears in analytics and logs
If a rule near the end of a very large robots.txt seems ignored, it may fall beyond the parsed size limit. Confirming which URLs crawlers still fetch reveals whether your rules are actually taking effect.
Diagnostic use case
Keep robots.txt within the parsed size limit and understand which rule applies when several match, so no important rule is silently ignored.
What WebmasterID can help detect
WebmasterID shows which paths crawlers fetch, so you can detect when expected rules are not being applied — for instance because they sit past the size limit.
Common mistakes
- Letting a generated robots.txt exceed 500 KiB and silently dropping rules.
- Assuming the first matching rule wins rather than the most specific.
- Padding the file with redundant per-URL rules instead of patterns.
Privacy and accuracy notes
robots.txt size and parsing are public configuration details. They involve no visitor data.
Related pages
- robots.txt path matching and case sensitivity
robots.txt path rules are compared against the URL path, and that comparison is case-sensitive: /Page and /page are different. This page covers how Google matches paths, why case and encoding matter, and how trailing characters and wildcards change the rule that applies.
- Wildcards and path matching in robots.txt
Although the original protocol used simple prefix matching, major crawlers support two wildcards in path rules: * matches any sequence of characters, and $ anchors the end of the URL. This page covers how they behave, useful patterns, and the mistakes that make a rule too broad.
- robots.txt basics: what it does and what it cannot do
robots.txt is a plain-text file at your site root that tells compliant crawlers which paths they may request. This page covers the directives, how user-agent groups are matched, and the limits that trip people up: robots.txt is advisory, it does not hide pages from search, and it is not a security boundary.
- Website observability
Confirm your robots.txt rules are actually taking effect.
Sources and verification notes
- Google — How Google interprets robots.txtDocuments the 500 KiB size limit and most-specific-rule precedence.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.