The Host directive in robots.txt explained
Host was a non-standard robots.txt directive, used mainly by Yandex, to indicate a site's preferred mirror or hostname. This page explains what it did, why it is not part of the robots.txt standard, and what to use instead for hostname canonicalisation today.
What Host did
Host was a non-standard robots.txt directive primarily recognised by Yandex. When a site was reachable under several hostnames or mirrors, Host let the operator tell Yandex which hostname to treat as the main one for indexing. It was written as a single line, for example:
Host: example.com
It was never part of the original robots.txt specification, and Google has stated it does not support a Host directive. Relying on it for non-Yandex engines was therefore ineffective.
- Non-standard directive, used mainly by Yandex
- Indicated a preferred mirror/hostname for indexing
- Not supported by Google; ignored by most crawlers
What to use instead
Modern hostname canonicalisation does not rely on Host. Use 301 redirects from non-preferred hostnames to your canonical host, and rel=canonical tags to indicate the preferred URL. These are widely supported across search engines.
Yandex itself has moved guidance toward standard canonicalisation signals over time, so even for Yandex you should prefer redirects and canonical tags. Treat any Host line in an existing robots.txt as legacy and verify whether it still has any effect.
How it appears in analytics and logs
A Host directive in a robots.txt file is a historical preferred-mirror hint aimed mainly at Yandex. Most crawlers, including Googlebot, ignore it, so its presence rarely changes how non-Yandex crawlers behave.
Diagnostic use case
Understand legacy robots.txt files that still contain a Host line, and choose modern canonicalisation methods (redirects and canonical tags) instead of relying on Host.
What WebmasterID can help detect
WebmasterID reports which crawlers reach your hostnames, which helps you see whether legacy directives like Host correlate with any real change in crawler behavior across mirrors.
Common mistakes
- Expecting Google or most crawlers to honour a Host directive — they do not.
- Relying on Host instead of 301 redirects and canonical tags.
- Leaving a stale Host line in robots.txt assuming it still controls mirrors.
Privacy and accuracy notes
The Host directive concerns your own site's preferred hostname. It involves no visitor data and is not an access-control mechanism.
Related pages
- The Clean-param directive in robots.txt explained
Clean-param is a Yandex-specific robots.txt directive that lists URL query parameters Yandex should ignore when crawling, helping consolidate duplicate URLs. This page explains its syntax, what it does, and why Google relies on different mechanisms.
- How to control YandexBot in robots.txt
YandexBot is the crawler for Yandex, a major search engine in Russia and nearby markets. You can target it in robots.txt with the YandexBot token. Yandex documents its robots.txt handling, has historically honoured crawl-delay, and provides additional crawl controls in Yandex.Webmaster.
- Canonical vs noindex: which to use
rel=canonical and noindex are often confused. Canonical tells search engines which of several similar URLs to treat as the primary, consolidating signals onto it. noindex removes a page from the index entirely. This page explains when each is right and why combining them on one URL sends conflicting signals.
- WebmasterID docs
How WebmasterID reports crawler activity across hostnames.
Sources and verification notes
- Google — How Google interprets robots.txt (unsupported directives)Google documents that it does not support the Host directive.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.