How to block AI2Bot in robots.txt
AI2Bot is a crawler associated with the Allen Institute for AI (AI2), which produces open AI research and datasets. This page gives the robots.txt rule to disallow its token and stays cautious where public documentation is limited, marking unverified specifics rather than guessing.
What AI2Bot is
AI2Bot is a crawler associated with the Allen Institute for AI (AI2), a research organisation known for open models and datasets. Because AI2's outputs are research-oriented and often openly published, blocking AI2Bot reduces your presence in that crawling. Where AI2's public documentation is incomplete, this entry avoids asserting specifics that cannot be confidently sourced.
The rule
To disallow AI2Bot site-wide, target its token:
User-agent: AI2Bot Disallow: /
Treat this as a forward-looking request and confirm in your logs that requests stop. robots.txt is honoured by compliant crawlers but cannot force compliance, and it is not an access-control boundary. Do not invent IP ranges to verify it.
- Token: AI2Bot
- Associated with the Allen Institute for AI
- Some specifics not fully documented — verify the effect in logs
How it appears in analytics and logs
A request carrying the AI2Bot token is a crawler associated with the Allen Institute for AI fetching a URL. After a disallow, confirm in logs whether AI2Bot activity actually stops.
Diagnostic use case
Disallow AI2Bot to keep your content out of crawling associated with AI2's open research datasets.
What WebmasterID can help detect
WebmasterID classifies AI2Bot by its token, separate from human traffic, so you can see whether a disallow rule changed its activity.
Common mistakes
- Asserting documented behaviour where AI2's public docs are sparse.
- Inventing IP ranges to verify AI2Bot.
- Misspelling the token — it must be exactly AI2Bot.
Privacy and accuracy notes
Blocking AI2Bot is a publishing-policy choice in a public file. It involves no visitor data and is not access control.
Related pages
- AI2Bot — Allen Institute for AI crawler
AI2Bot is the crawler operated by the Allen Institute for AI (AI2) to gather web data for its datasets and research. AI2 documents the crawler and its robots.txt token. Where a specific is not clearly covered it is marked partially verified rather than guessed.
- How to block CCBot (Common Crawl)
CCBot is the crawler operated by Common Crawl, a non-profit that publishes a large open web-crawl dataset reused by many downstream projects, including some AI training pipelines. This page gives the robots.txt rule to disallow CCBot and explains why blocking it affects that dataset specifically.
- Writing an AI crawler policy for robots.txt
An AI crawler policy is a deliberate decision about which AI-related tokens you allow and which you disallow in robots.txt. This page offers a structured way to make and document those choices, while staying realistic: robots.txt is a request to compliant crawlers, not a legal or technical guarantee.
- Bot intelligence
See whether an AI2Bot disallow changed its activity.
Sources and verification notes
- Allen Institute for AI — crawler reference (token observed)Token AI2Bot is observed; comprehensive official docs are limited, so specifics are marked partially verified.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.