Robots & crawl control

robots.txt for API endpoints

JSON APIs are sometimes added to robots.txt to keep crawlers out, but robots.txt only requests compliance from polite crawlers and does nothing to authenticate or hide an endpoint. This page covers when disallowing /api is reasonable, what it does not do, and why access control belongs at the application layer.

Verified against primary sources

When disallowing /api helps

If search engines are crawling JSON endpoints that should not appear in results — wasting crawl budget on responses that are not pages — a robots.txt Disallow on the API path is a reasonable signal to compliant crawlers to skip them.

User-agent: * Disallow: /api/

This reduces polite-crawler requests to those paths. It does not remove anything from an index by itself; for already-indexed API URLs, use noindex via an X-Robots-Tag header on the response instead, since a Disallow can prevent the crawler from even seeing the noindex.

Disallow: /api/ asks compliant crawlers to skip API responses
Reduces crawl waste on non-page JSON
Does not authenticate, hide, or secure the endpoint

robots.txt is not access control

robots.txt is a public file and a request to well-behaved crawlers. It cannot stop scrapers, abusive clients, or anyone who reads the file and calls the endpoint directly — in fact, listing a sensitive path advertises it.

Protect APIs with real controls: authentication tokens, rate limiting, and a WAF or firewall. Use robots.txt only to manage crawl behavior of compliant bots, and put security at the application and network layers where it can actually be enforced.

How it appears in analytics and logs

Crawler hits on /api endpoints usually mean a bot is following links or guessing paths. A robots.txt Disallow reduces polite-crawler requests but does not stop a determined or non-compliant client.

Diagnostic use case

Decide whether to disallow API paths in robots.txt to reduce wasted crawl, while keeping real access control at the application layer where it belongs.

What WebmasterID can help detect

WebmasterID records which crawlers request API paths, so you can see whether compliant bots stopped after a Disallow and which automated clients keep hitting the endpoint regardless.

Common mistakes

Believing Disallow: /api/ secures an endpoint — it only requests crawler compliance.
Listing a sensitive internal path in robots.txt and advertising its existence.
Disallowing an API URL that is already indexed, blocking the noindex that would remove it.

Privacy and accuracy notes

robots.txt is a public file; listing API paths there reveals their existence. Securing an endpoint requires authentication, not robots.txt, and no visitor data is involved in the rule itself.

↑ All robots topics in Robots & crawl control

Sources and verification notes

Google — robots.txt is not an access-control mechanismrobots.txt requests compliance; it does not secure or hide URLs.
Google — block indexing with noindex (X-Robots-Tag)Use noindex header for removal; a Disallow can hide the noindex.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.