Formula Won Labs

[ Research · Reference ]

AI Bot Allowlist for robots.txt (2026, maintained)

The definitive list of AI crawlers to allow in robots.txt for local business websites. Every bot, every owner, every recommended action, with links to the primary-source documentation that confirms the user agent string.

Last updated 2026-06-21 · Maintained by Formula Won Labs · Updates surfaced within 14 days via our ai_guidance_watcher cron against primary vendor docs.

[ TL;DR ]

For local business websites in 2026, allow all 13 Tier 1 AI bots (OpenAI x3, Anthropic x3, Perplexity x2, Google x2, Microsoft x1, Apple x2). Blocking AI crawlers does not protect your content from being used. It only removes your business from AI search results, where increasing share of local-intent queries now resolves.

Remove the deprecated anthropic-ai user agent from any existing robots.txt. It does nothing now and signals you have not updated your config since 2024.

[ The List ]

Tier 1 — Must allow for AI search visibility

13 bots from OpenAI, Anthropic, Perplexity, Google, Microsoft, Apple. Blocking any of these removes your site from a major AI search surface.

User-AgentOwnerPurposeActionNotes
OAI-SearchBotOpenAIRetrieval / search citationALLOWPowers ChatGPT search results. The single most important AI bot for citation visibility. docs
GPTBotOpenAITraining data crawlerALLOWAllow unless your content is the product. Training inclusion strengthens AI familiarity with your brand entity. docs
ChatGPT-UserOpenAIUser-triggered URL fetchALLOWFires when a ChatGPT user pastes or asks about your specific URL. docs
ClaudeBotAnthropicTraining data crawlerALLOWReplaces the deprecated anthropic-ai user agent. docs
Claude-SearchBotAnthropicRetrieval / search citationALLOWPowers Claude search results. docs
Claude-UserAnthropicUser-triggered URL fetchALLOWFires when a Claude user references your specific URL. docs
PerplexityBotPerplexityRetrieval / search citationALLOWPerplexity citation-rate has grown 2026 — allow without exception. docs
Perplexity-UserPerplexityUser-triggered URL fetchALLOWPer-query user-triggered fetch. docs
GoogleOtherGoogleAI-related product crawling, separate from SearchALLOWGoogle's bucket for AI/research-product crawling distinct from Googlebot. docs
Google-ExtendedGoogleGemini training opt-out tokenALLOWBlocking removes you from Gemini training. Allow unless content is the product. docs
BingbotMicrosoftBing index + CopilotALLOWBing index gates ChatGPT search citation (~87% top-20 overlap, per Seer Interactive). Critical for AI visibility, not just Bing rank. docs
ApplebotAppleApple Intelligence + SiriALLOWApple Intelligence on iOS 18 and macOS Sequoia pulls from Apple's index and partner engines. docs
Applebot-ExtendedAppleApple AI model training opt-out tokenALLOWBlocking removes content from Apple Foundation Models training. docs

Tier 2 — Secondary AI crawlers worth allowing

Useful inputs to broader AI ecosystems and emerging consumer AI products.

User-AgentOwnerPurposeActionNotes
CCBotCommon CrawlOpen web archive feeding many AI modelsALLOWCommon Crawl is upstream input for many smaller LLMs and academic systems. docs
AmazonbotAmazonAlexa + Amazon AI productsALLOWPowers Alexa answers and Amazon AI product surfaces. docs
BytespiderByteDanceTikTok AI search + DoubaoALLOWTikTok's AI search products + Doubao (China-region). docs
cohere-aiCohereCohere model training + retrievalALLOWCohere's enterprise AI models. docs
DiffbotDiffbotKnowledge graph + entity extractionALLOWBuilds Knowledge Graph used by enterprise customers and AI systems. docs
FacebookBotMetaMeta AI products + Instagram entity resolutionALLOWDistinct from FacebookExternalHit (link-preview only). docs
meta-externalagentMetaMeta AI assistant crawlsALLOWUser-triggered Meta AI fetches. docs
Meta-ExternalFetcherMetaMeta AI realtime fetchesALLOWReal-time content fetches for Meta AI. docs

Tier 3 — Emerging or regional

Lower volume but growing. Allow unless server load is an issue.

User-AgentOwnerPurposeActionNotes
xAI-BotxAIGrok training + retrievalALLOWGrok citation rates are growing as xAI distribution expands. docs
GrokxAIUser-triggered fetchALLOWUser-prompt-driven URL fetches by Grok. docs
MistralAI-UserMistral AILe Chat user fetchesALLOWLe Chat (Mistral's consumer AI) user-triggered URL fetches. docs
DeepSeekBotDeepSeekDeepSeek model trainingALLOWDeepSeek's research and consumer AI products. docs
YouBotYou.comYou.com AI searchALLOWAI-first search engine; smaller share but high citation rate per visit. docs
PhindBotPhindPhind developer AI searchALLOWNiche but high authority in developer queries. docs
KagibotKagiKagi paid search + AIALLOWPaid search audience tends to be high-intent. docs

Tier 4 — Deprecated / remove from robots.txt

These user agents no longer apply. Remove to keep your config current.

User-AgentOwnerPurposeActionNotes
anthropic-aiAnthropicDEPRECATED — replaced by ClaudeBot familyREMOVEOld user agent; safe to remove from robots.txt. Use ClaudeBot, Claude-SearchBot, Claude-User instead. docs

[ Copy-paste robots.txt block ]

Drop the block below into your robots.txt as-is. Replace the sitemap URL with your own. Keep the trailing default User-agent: * rule so you do not inadvertently block legitimate non-AI crawlers.

# Allow major AI crawlers
User-agent: OAI-SearchBot
Allow: /

User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: Claude-User
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Perplexity-User
Allow: /

User-agent: GoogleOther
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Bingbot
Allow: /

User-agent: Applebot
Allow: /

User-agent: Applebot-Extended
Allow: /

# Default
User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml

[ How this list is maintained ]

Every bot on this page links to its primary-source vendor documentation. We do not include user agents that lack vendor confirmation.

The list is checked biweekly by an automated watcher (Formula Won Labs ai_guidance_watcher cron) that diffs OpenAI, Anthropic, Perplexity, Google, Microsoft, and Apple bot documentation pages and surfaces additions, deprecations, and behavior changes within 14 days.

When a primary-source document changes (a new bot, a renamed bot, a deprecated bot), this page is updated and the Last updated date at the top reflects the change.

Want to suggest an addition? The page is also a reference for what we recommend on every Formula Won Labs site build, so the bar is vendor-confirmed only. Reach out.

Audit your robots.txt against this list

We run a free AI bot allowlist check against your domain. We flag missing allowances, deprecated user agents still in your config, and any conflicts with your default rules.

Get a free robots.txt audit

[ License ]

CC-BY-4.0. Free to cite, embed, copy. Suggested citation:

Formula Won Labs (2026). AI Bot Allowlist for robots.txt. Retrieved from https://www.formulawonlabs.com/research/ai-bot-allowlist