Robots & crawl control

The Host directive in robots.txt explained

Host was a non-standard robots.txt directive, used mainly by Yandex, to indicate a site's preferred mirror or hostname. This page explains what it did, why it is not part of the robots.txt standard, and what to use instead for hostname canonicalisation today.

Verified against primary sources

What Host did

Host was a non-standard robots.txt directive primarily recognised by Yandex. When a site was reachable under several hostnames or mirrors, Host let the operator tell Yandex which hostname to treat as the main one for indexing. It was written as a single line, for example:

Host: example.com

It was never part of the original robots.txt specification, and Google has stated it does not support a Host directive. Relying on it for non-Yandex engines was therefore ineffective.

Non-standard directive, used mainly by Yandex
Indicated a preferred mirror/hostname for indexing
Not supported by Google; ignored by most crawlers

What to use instead

Modern hostname canonicalisation does not rely on Host. Use 301 redirects from non-preferred hostnames to your canonical host, and rel=canonical tags to indicate the preferred URL. These are widely supported across search engines.

Yandex itself has moved guidance toward standard canonicalisation signals over time, so even for Yandex you should prefer redirects and canonical tags. Treat any Host line in an existing robots.txt as legacy and verify whether it still has any effect.

How it appears in analytics and logs

A Host directive in a robots.txt file is a historical preferred-mirror hint aimed mainly at Yandex. Most crawlers, including Googlebot, ignore it, so its presence rarely changes how non-Yandex crawlers behave.

Diagnostic use case

Understand legacy robots.txt files that still contain a Host line, and choose modern canonicalisation methods (redirects and canonical tags) instead of relying on Host.

What WebmasterID can help detect

WebmasterID reports which crawlers reach your hostnames, which helps you see whether legacy directives like Host correlate with any real change in crawler behavior across mirrors.

Common mistakes

Expecting Google or most crawlers to honour a Host directive — they do not.
Relying on Host instead of 301 redirects and canonical tags.
Leaving a stale Host line in robots.txt assuming it still controls mirrors.

Privacy and accuracy notes

The Host directive concerns your own site's preferred hostname. It involves no visitor data and is not an access-control mechanism.

↑ All robots topics in Robots & crawl control

Sources and verification notes

Google — How Google interprets robots.txt (unsupported directives)Google documents that it does not support the Host directive.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.