robots.txt for API endpoints
JSON APIs are sometimes added to robots.txt to keep crawlers out, but robots.txt only requests compliance from polite crawlers and does nothing to authenticate or hide an endpoint. This page covers when disallowing /api is reasonable, what it does not do, and why access control belongs at the application layer.
When disallowing /api helps
If search engines are crawling JSON endpoints that should not appear in results — wasting crawl budget on responses that are not pages — a robots.txt Disallow on the API path is a reasonable signal to compliant crawlers to skip them.
User-agent: * Disallow: /api/
This reduces polite-crawler requests to those paths. It does not remove anything from an index by itself; for already-indexed API URLs, use noindex via an X-Robots-Tag header on the response instead, since a Disallow can prevent the crawler from even seeing the noindex.
- Disallow: /api/ asks compliant crawlers to skip API responses
- Reduces crawl waste on non-page JSON
- Does not authenticate, hide, or secure the endpoint
robots.txt is not access control
robots.txt is a public file and a request to well-behaved crawlers. It cannot stop scrapers, abusive clients, or anyone who reads the file and calls the endpoint directly — in fact, listing a sensitive path advertises it.
Protect APIs with real controls: authentication tokens, rate limiting, and a WAF or firewall. Use robots.txt only to manage crawl behavior of compliant bots, and put security at the application and network layers where it can actually be enforced.
How it appears in analytics and logs
Crawler hits on /api endpoints usually mean a bot is following links or guessing paths. A robots.txt Disallow reduces polite-crawler requests but does not stop a determined or non-compliant client.
Diagnostic use case
Decide whether to disallow API paths in robots.txt to reduce wasted crawl, while keeping real access control at the application layer where it belongs.
What WebmasterID can help detect
WebmasterID records which crawlers request API paths, so you can see whether compliant bots stopped after a Disallow and which automated clients keep hitting the endpoint regardless.
Common mistakes
- Believing Disallow: /api/ secures an endpoint — it only requests crawler compliance.
- Listing a sensitive internal path in robots.txt and advertising its existence.
- Disallowing an API URL that is already indexed, blocking the noindex that would remove it.
Privacy and accuracy notes
robots.txt is a public file; listing API paths there reveals their existence. Securing an endpoint requires authentication, not robots.txt, and no visitor data is involved in the rule itself.
Related pages
- robots.txt vs a firewall/WAF
robots.txt and a firewall/WAF solve different problems: robots.txt politely asks compliant crawlers what to skip, while a firewall or WAF actually blocks requests at the network or edge layer. This page contrasts the two, explains when each is appropriate, and warns against using robots.txt for jobs only enforcement can do.
- X-Robots-Tag header examples
X-Robots-Tag carries indexing directives in the HTTP response header instead of the HTML body, which makes it the way to apply noindex or nofollow to PDFs, images, and other non-HTML files. This page gives concrete header examples and notes how server config applies them in bulk.
- robots.txt and URL query parameters
Query-string URLs (?sort=, ?utm_source=, ?sessionid=) can multiply crawlable URLs. This page explains how robots.txt wildcards match parameters, when blocking helps, and why canonical or noindex is often better than a Disallow for duplicates.
- WebmasterID docs
How crawler and bot traffic is recorded server-side.
Sources and verification notes
- Google — robots.txt is not an access-control mechanismrobots.txt requests compliance; it does not secure or hide URLs.
- Google — block indexing with noindex (X-Robots-Tag)Use noindex header for removal; a Disallow can hide the noindex.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.