The Clean-param directive in robots.txt explained
Clean-param is a Yandex-specific robots.txt directive that lists URL query parameters Yandex should ignore when crawling, helping consolidate duplicate URLs. This page explains its syntax, what it does, and why Google relies on different mechanisms.
What Clean-param does
Clean-param is a directive Yandex recognises in robots.txt. It lets you name query-string parameters that do not change page content — such as tracking tags or session identifiers — so Yandex treats URLs differing only in those parameters as the same resource. This reduces duplicate crawling and helps consolidate ranking signals for Yandex.
It is Yandex-specific. Other crawlers, including Googlebot, do not act on Clean-param, so it is not a general-purpose parameter-handling tool.
- Yandex-specific robots.txt directive
- Names parameters that do not change content
- Helps Yandex consolidate duplicate parameterised URLs
Syntax and alternatives
The directive takes a parameter list and an optional path prefix:
Clean-param: utm_source&utm_medium /catalog/
This tells Yandex to ignore those parameters for URLs under /catalog/. Multiple parameters are joined with &, and you can repeat the directive for different paths.
For Google and most other engines, consolidate duplicate parameterised URLs with rel=canonical tags and consistent internal linking instead — Google retired its old URL parameters tool and relies on canonicalisation signals.
How it appears in analytics and logs
Clean-param is a Yandex-specific instruction read from robots.txt. It does not change other crawlers' behavior; for Yandex, it can reduce redundant crawling of parameterised URL variants.
Diagnostic use case
Tell Yandex to ignore tracking or session parameters (like utm_source or sort order) so duplicate parameterised URLs are crawled and indexed as a single canonical resource.
What WebmasterID can help detect
WebmasterID reports crawler hits per URL, so you can see whether parameterised variants are being crawled — context for deciding whether a Clean-param rule for Yandex would help.
Common mistakes
- Expecting Google to honour Clean-param — it is Yandex-specific.
- Listing parameters that actually change content, causing distinct pages to be merged.
- Forgetting the optional path prefix and applying it more broadly than intended.
Privacy and accuracy notes
Clean-param concerns how your own URLs are crawled. It involves no visitor data and is not an access-control mechanism.
Related pages
- The Host directive in robots.txt explained
Host was a non-standard robots.txt directive, used mainly by Yandex, to indicate a site's preferred mirror or hostname. This page explains what it did, why it is not part of the robots.txt standard, and what to use instead for hostname canonicalisation today.
- How to control YandexBot in robots.txt
YandexBot is the crawler for Yandex, a major search engine in Russia and nearby markets. You can target it in robots.txt with the YandexBot token. Yandex documents its robots.txt handling, has historically honoured crawl-delay, and provides additional crawl controls in Yandex.Webmaster.
- Wildcards and path matching in robots.txt
Although the original protocol used simple prefix matching, major crawlers support two wildcards in path rules: * matches any sequence of characters, and $ anchors the end of the URL. This page covers how they behave, useful patterns, and the mistakes that make a rule too broad.
- WebmasterID docs
How WebmasterID reports crawler hits per parameterised URL.
Sources and verification notes
- Yandex — Clean-param directive documentationDocuments the Clean-param syntax and behavior for Yandex.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.