Which AI bots should I allow in robots.txt for my local business website?

At minimum, allow the search-retrieval bots: OAI-SearchBot, GPTBot, ChatGPT-User (OpenAI); ClaudeBot, Claude-SearchBot, Claude-User (Anthropic); PerplexityBot, Perplexity-User (Perplexity); GoogleOther and Google-Extended (Google); Applebot (Apple Intelligence/Siri); and Bingbot. These are the bots that index your site for citation in AI-generated answers. Blocking them does not protect your content from being used; it only removes you from AI search results.

Should I block AI training bots like CCBot or GPTBot to protect my content?

For most local businesses the answer is no. AI engines that cannot crawl your site cannot cite your business in responses to user questions. The content protection benefit (preventing inclusion in training data) is outweighed by the visibility cost (zero citation opportunity in the conversational layer where your customers are increasingly searching). If your business depends on proprietary written content as the product itself (publishers, course creators), the tradeoff is different.

What is the difference between GPTBot and OAI-SearchBot?

GPTBot is OpenAI's training-data crawler — it collects content that may be used to train future models. OAI-SearchBot is the retrieval-only crawler that indexes pages for real-time citation in ChatGPT search results. Allowing both is recommended for AI search visibility. If you only want citation but not training-data inclusion, allow OAI-SearchBot and block GPTBot. Note that ChatGPT-User is a separate identifier used when a user's prompt triggers a direct fetch of a specific URL.

Does anthropic-ai still work as a user-agent for Anthropic crawlers?

No. The anthropic-ai user agent was deprecated. Anthropic now uses three distinct user agents: ClaudeBot for training-data crawling, Claude-SearchBot for search-result retrieval, and Claude-User for in-conversation fetches triggered by a user prompt. Update older robots.txt rules that reference anthropic-ai to use the current three identifiers per Anthropic's published documentation.

How often does the AI bot list change?

We update this page whenever a vendor publishes a change to their crawler user agent or adds a new one. The most recent changes in 2026 include Anthropic splitting Claude into three user agents (replacing anthropic-ai), Google's separation of GoogleOther from Googlebot for AI-related crawling, and the rise of new entrants like xAI-Bot, MistralAI-User, and DeepSeekBot. The Formula Won Labs ai_guidance_watcher runs biweekly against primary-source vendor docs and surfaces changes within 14 days.

[ Research · Reference ]

AI Bot Allowlist for robots.txt (2026, maintained)

The definitive list of AI crawlers to allow in robots.txt for local business websites. Every bot, every owner, every recommended action, with links to the primary-source documentation that confirms the user agent string.

Last updated 2026-06-21 · Maintained by Formula Won Labs · Updates surfaced within 14 days via our ai_guidance_watcher cron against primary vendor docs.

[ TL;DR ]

For local business websites in 2026, allow all 13 Tier 1 AI bots (OpenAI x3, Anthropic x3, Perplexity x2, Google x2, Microsoft x1, Apple x2). Blocking AI crawlers does not protect your content from being used. It only removes your business from AI search results, where increasing share of local-intent queries now resolves.

Remove the deprecated anthropic-ai user agent from any existing robots.txt. It does nothing now and signals you have not updated your config since 2024.

[ The List ]

Tier 1 — Must allow for AI search visibility

13 bots from OpenAI, Anthropic, Perplexity, Google, Microsoft, Apple. Blocking any of these removes your site from a major AI search surface.

User-Agent	Owner	Purpose	Action	Notes
`OAI-SearchBot`	OpenAI	Retrieval / search citation	ALLOW	Powers ChatGPT search results. The single most important AI bot for citation visibility. docs
`GPTBot`	OpenAI	Training data crawler	ALLOW	Allow unless your content is the product. Training inclusion strengthens AI familiarity with your brand entity. docs
`ChatGPT-User`	OpenAI	User-triggered URL fetch	ALLOW	Fires when a ChatGPT user pastes or asks about your specific URL. docs
`ClaudeBot`	Anthropic	Training data crawler	ALLOW	Replaces the deprecated anthropic-ai user agent. docs
`Claude-SearchBot`	Anthropic	Retrieval / search citation	ALLOW	Powers Claude search results. docs
`Claude-User`	Anthropic	User-triggered URL fetch	ALLOW	Fires when a Claude user references your specific URL. docs
`PerplexityBot`	Perplexity	Retrieval / search citation	ALLOW	Perplexity citation-rate has grown 2026 — allow without exception. docs
`Perplexity-User`	Perplexity	User-triggered URL fetch	ALLOW	Per-query user-triggered fetch. docs
`GoogleOther`	Google	AI-related product crawling, separate from Search	ALLOW	Google's bucket for AI/research-product crawling distinct from Googlebot. docs
`Google-Extended`	Google	Gemini training opt-out token	ALLOW	Blocking removes you from Gemini training. Allow unless content is the product. docs
`Bingbot`	Microsoft	Bing index + Copilot	ALLOW	Bing index gates ChatGPT search citation (~87% top-20 overlap, per Seer Interactive). Critical for AI visibility, not just Bing rank. docs
`Applebot`	Apple	Apple Intelligence + Siri	ALLOW	Apple Intelligence on iOS 18 and macOS Sequoia pulls from Apple's index and partner engines. docs
`Applebot-Extended`	Apple	Apple AI model training opt-out token	ALLOW	Blocking removes content from Apple Foundation Models training. docs

Tier 2 — Secondary AI crawlers worth allowing

Useful inputs to broader AI ecosystems and emerging consumer AI products.

User-Agent	Owner	Purpose	Action	Notes
`CCBot`	Common Crawl	Open web archive feeding many AI models	ALLOW	Common Crawl is upstream input for many smaller LLMs and academic systems. docs
`Amazonbot`	Amazon	Alexa + Amazon AI products	ALLOW	Powers Alexa answers and Amazon AI product surfaces. docs
`Bytespider`	ByteDance	TikTok AI search + Doubao	ALLOW	TikTok's AI search products + Doubao (China-region). docs
`cohere-ai`	Cohere	Cohere model training + retrieval	ALLOW	Cohere's enterprise AI models. docs
`Diffbot`	Diffbot	Knowledge graph + entity extraction	ALLOW	Builds Knowledge Graph used by enterprise customers and AI systems. docs
`FacebookBot`	Meta	Meta AI products + Instagram entity resolution	ALLOW	Distinct from FacebookExternalHit (link-preview only). docs
`meta-externalagent`	Meta	Meta AI assistant crawls	ALLOW	User-triggered Meta AI fetches. docs
`Meta-ExternalFetcher`	Meta	Meta AI realtime fetches	ALLOW	Real-time content fetches for Meta AI. docs

Tier 3 — Emerging or regional

Lower volume but growing. Allow unless server load is an issue.

User-Agent	Owner	Purpose	Action	Notes
`xAI-Bot`	xAI	Grok training + retrieval	ALLOW	Grok citation rates are growing as xAI distribution expands. docs
`Grok`	xAI	User-triggered fetch	ALLOW	User-prompt-driven URL fetches by Grok. docs
`MistralAI-User`	Mistral AI	Le Chat user fetches	ALLOW	Le Chat (Mistral's consumer AI) user-triggered URL fetches. docs
`DeepSeekBot`	DeepSeek	DeepSeek model training	ALLOW	DeepSeek's research and consumer AI products. docs
`YouBot`	You.com	You.com AI search	ALLOW	AI-first search engine; smaller share but high citation rate per visit. docs
`PhindBot`	Phind	Phind developer AI search	ALLOW	Niche but high authority in developer queries. docs
`Kagibot`	Kagi	Kagi paid search + AI	ALLOW	Paid search audience tends to be high-intent. docs

Tier 4 — Deprecated / remove from robots.txt

These user agents no longer apply. Remove to keep your config current.

User-Agent	Owner	Purpose	Action	Notes
`anthropic-ai`	Anthropic	DEPRECATED — replaced by ClaudeBot family	REMOVE	Old user agent; safe to remove from robots.txt. Use ClaudeBot, Claude-SearchBot, Claude-User instead. docs

[ Copy-paste robots.txt block ]

Drop the block below into your robots.txt as-is. Replace the sitemap URL with your own. Keep the trailing default User-agent: * rule so you do not inadvertently block legitimate non-AI crawlers.

# Allow major AI crawlers
User-agent: OAI-SearchBot
Allow: /

User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: Claude-User
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Perplexity-User
Allow: /

User-agent: GoogleOther
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Bingbot
Allow: /

User-agent: Applebot
Allow: /

User-agent: Applebot-Extended
Allow: /

# Default
User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml

[ How this list is maintained ]

Every bot on this page links to its primary-source vendor documentation. We do not include user agents that lack vendor confirmation.

The list is checked biweekly by an automated watcher (Formula Won Labs ai_guidance_watcher cron) that diffs OpenAI, Anthropic, Perplexity, Google, Microsoft, and Apple bot documentation pages and surfaces additions, deprecations, and behavior changes within 14 days.

When a primary-source document changes (a new bot, a renamed bot, a deprecated bot), this page is updated and the Last updated date at the top reflects the change.

Want to suggest an addition? The page is also a reference for what we recommend on every Formula Won Labs site build, so the bar is vendor-confirmed only. Reach out.

Audit your robots.txt against this list

We run a free AI bot allowlist check against your domain. We flag missing allowances, deprecated user agents still in your config, and any conflicts with your default rules.

Get a free robots.txt audit

[ License ]

CC-BY-4.0. Free to cite, embed, copy. Suggested citation:

Formula Won Labs (2026). AI Bot Allowlist for robots.txt. Retrieved from https://www.formulawonlabs.com/research/ai-bot-allowlist