[ Research · Reference ]
AI Bot Allowlist for robots.txt (2026, maintained)
The definitive list of AI crawlers to allow in robots.txt for local business websites. Every bot, every owner, every recommended action, with links to the primary-source documentation that confirms the user agent string.
Last updated 2026-06-21 · Maintained by Formula Won Labs · Updates surfaced within 14 days via our ai_guidance_watcher cron against primary vendor docs.
[ TL;DR ]
For local business websites in 2026, allow all 13 Tier 1 AI bots (OpenAI x3, Anthropic x3, Perplexity x2, Google x2, Microsoft x1, Apple x2). Blocking AI crawlers does not protect your content from being used. It only removes your business from AI search results, where increasing share of local-intent queries now resolves.
Remove the deprecated anthropic-ai user agent from any existing robots.txt. It does nothing now and signals you have not updated your config since 2024.
[ The List ]
Tier 1 — Must allow for AI search visibility
13 bots from OpenAI, Anthropic, Perplexity, Google, Microsoft, Apple. Blocking any of these removes your site from a major AI search surface.
| User-Agent | Owner | Purpose | Action | Notes |
|---|---|---|---|---|
OAI-SearchBot | OpenAI | Retrieval / search citation | ALLOW | Powers ChatGPT search results. The single most important AI bot for citation visibility. docs |
GPTBot | OpenAI | Training data crawler | ALLOW | Allow unless your content is the product. Training inclusion strengthens AI familiarity with your brand entity. docs |
ChatGPT-User | OpenAI | User-triggered URL fetch | ALLOW | Fires when a ChatGPT user pastes or asks about your specific URL. docs |
ClaudeBot | Anthropic | Training data crawler | ALLOW | Replaces the deprecated anthropic-ai user agent. docs |
Claude-SearchBot | Anthropic | Retrieval / search citation | ALLOW | Powers Claude search results. docs |
Claude-User | Anthropic | User-triggered URL fetch | ALLOW | Fires when a Claude user references your specific URL. docs |
PerplexityBot | Perplexity | Retrieval / search citation | ALLOW | Perplexity citation-rate has grown 2026 — allow without exception. docs |
Perplexity-User | Perplexity | User-triggered URL fetch | ALLOW | Per-query user-triggered fetch. docs |
GoogleOther | AI-related product crawling, separate from Search | ALLOW | Google's bucket for AI/research-product crawling distinct from Googlebot. docs | |
Google-Extended | Gemini training opt-out token | ALLOW | Blocking removes you from Gemini training. Allow unless content is the product. docs | |
Bingbot | Microsoft | Bing index + Copilot | ALLOW | Bing index gates ChatGPT search citation (~87% top-20 overlap, per Seer Interactive). Critical for AI visibility, not just Bing rank. docs |
Applebot | Apple | Apple Intelligence + Siri | ALLOW | Apple Intelligence on iOS 18 and macOS Sequoia pulls from Apple's index and partner engines. docs |
Applebot-Extended | Apple | Apple AI model training opt-out token | ALLOW | Blocking removes content from Apple Foundation Models training. docs |
Tier 2 — Secondary AI crawlers worth allowing
Useful inputs to broader AI ecosystems and emerging consumer AI products.
| User-Agent | Owner | Purpose | Action | Notes |
|---|---|---|---|---|
CCBot | Common Crawl | Open web archive feeding many AI models | ALLOW | Common Crawl is upstream input for many smaller LLMs and academic systems. docs |
Amazonbot | Amazon | Alexa + Amazon AI products | ALLOW | Powers Alexa answers and Amazon AI product surfaces. docs |
Bytespider | ByteDance | TikTok AI search + Doubao | ALLOW | TikTok's AI search products + Doubao (China-region). docs |
cohere-ai | Cohere | Cohere model training + retrieval | ALLOW | Cohere's enterprise AI models. docs |
Diffbot | Diffbot | Knowledge graph + entity extraction | ALLOW | Builds Knowledge Graph used by enterprise customers and AI systems. docs |
FacebookBot | Meta | Meta AI products + Instagram entity resolution | ALLOW | Distinct from FacebookExternalHit (link-preview only). docs |
meta-externalagent | Meta | Meta AI assistant crawls | ALLOW | User-triggered Meta AI fetches. docs |
Meta-ExternalFetcher | Meta | Meta AI realtime fetches | ALLOW | Real-time content fetches for Meta AI. docs |
Tier 3 — Emerging or regional
Lower volume but growing. Allow unless server load is an issue.
| User-Agent | Owner | Purpose | Action | Notes |
|---|---|---|---|---|
xAI-Bot | xAI | Grok training + retrieval | ALLOW | Grok citation rates are growing as xAI distribution expands. docs |
Grok | xAI | User-triggered fetch | ALLOW | User-prompt-driven URL fetches by Grok. docs |
MistralAI-User | Mistral AI | Le Chat user fetches | ALLOW | Le Chat (Mistral's consumer AI) user-triggered URL fetches. docs |
DeepSeekBot | DeepSeek | DeepSeek model training | ALLOW | DeepSeek's research and consumer AI products. docs |
YouBot | You.com | You.com AI search | ALLOW | AI-first search engine; smaller share but high citation rate per visit. docs |
PhindBot | Phind | Phind developer AI search | ALLOW | Niche but high authority in developer queries. docs |
Kagibot | Kagi | Kagi paid search + AI | ALLOW | Paid search audience tends to be high-intent. docs |
Tier 4 — Deprecated / remove from robots.txt
These user agents no longer apply. Remove to keep your config current.
| User-Agent | Owner | Purpose | Action | Notes |
|---|---|---|---|---|
anthropic-ai | Anthropic | DEPRECATED — replaced by ClaudeBot family | REMOVE | Old user agent; safe to remove from robots.txt. Use ClaudeBot, Claude-SearchBot, Claude-User instead. docs |
[ Copy-paste robots.txt block ]
Drop the block below into your robots.txt as-is. Replace the sitemap URL with your own. Keep the trailing default User-agent: * rule so you do not inadvertently block legitimate non-AI crawlers.
# Allow major AI crawlers User-agent: OAI-SearchBot Allow: / User-agent: GPTBot Allow: / User-agent: ChatGPT-User Allow: / User-agent: ClaudeBot Allow: / User-agent: Claude-SearchBot Allow: / User-agent: Claude-User Allow: / User-agent: PerplexityBot Allow: / User-agent: Perplexity-User Allow: / User-agent: GoogleOther Allow: / User-agent: Google-Extended Allow: / User-agent: Bingbot Allow: / User-agent: Applebot Allow: / User-agent: Applebot-Extended Allow: / # Default User-agent: * Allow: / Sitemap: https://example.com/sitemap.xml
[ How this list is maintained ]
Every bot on this page links to its primary-source vendor documentation. We do not include user agents that lack vendor confirmation.
The list is checked biweekly by an automated watcher (Formula Won Labs ai_guidance_watcher cron) that diffs OpenAI, Anthropic, Perplexity, Google, Microsoft, and Apple bot documentation pages and surfaces additions, deprecations, and behavior changes within 14 days.
When a primary-source document changes (a new bot, a renamed bot, a deprecated bot), this page is updated and the Last updated date at the top reflects the change.
Want to suggest an addition? The page is also a reference for what we recommend on every Formula Won Labs site build, so the bar is vendor-confirmed only. Reach out.
Audit your robots.txt against this list
We run a free AI bot allowlist check against your domain. We flag missing allowances, deprecated user agents still in your config, and any conflicts with your default rules.
Get a free robots.txt audit[ License ]
CC-BY-4.0. Free to cite, embed, copy. Suggested citation:
Formula Won Labs (2026). AI Bot Allowlist for robots.txt. Retrieved from https://www.formulawonlabs.com/research/ai-bot-allowlist