robots.txt for AI crawlers — the complete guide
Which AI crawlers exist, which ones actually matter, and the exact robots.txt allowlist to make sure ChatGPT, Claude, Perplexity, and Google AI Overviews can read your site.
If your robots.txt blocks the wrong AI crawler, you can disappear from ChatGPT search results overnight — even if your SEO is otherwise pristine.
This guide is the working list of AI crawlers we audit against, organized by who matters and which file each operator publishes their bot list in.
The 12 AI crawlers we check
The AEO Site Checker probes robots.txt against every one of these. Seven are “critical” — blocking them removes you from the corresponding AI surface entirely.
| Bot | Operator | Purpose | Critical? |
|---|---|---|---|
GPTBot | OpenAI | Training data crawl | |
ChatGPT-User | OpenAI | Live fetch when ChatGPT browses for an answer | ✓ |
OAI-SearchBot | OpenAI | ChatGPT Search index | ✓ |
ClaudeBot | Anthropic | Training data crawl | |
Claude-User | Anthropic | Live fetch when Claude browses | ✓ |
Claude-SearchBot | Anthropic | Claude search index | ✓ |
PerplexityBot | Perplexity | Search index | ✓ |
Perplexity-User | Perplexity | Live fetch on user query | ✓ |
Google-Extended | Gemini training + AI Overviews | ✓ | |
Applebot-Extended | Apple | Apple Intelligence training | |
Meta-ExternalAgent | Meta | Meta AI training | |
CCBot | Common Crawl | Public web archive used by many models |
The rule of thumb: the *-User and *-SearchBot bots are the ones that decide whether you get cited live. Training bots (GPTBot, ClaudeBot) only affect future model versions; users won’t notice if you block them today.
The single allowlist that works for almost everyone
Drop this into public/robots.txt (or wherever your static files are served from):
# Default policy
User-agent: *
Allow: /
Disallow: /api/
Disallow: /admin/
# OpenAI
User-agent: GPTBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: OAI-SearchBot
Allow: /
# Anthropic
User-agent: ClaudeBot
Allow: /
User-agent: Claude-User
Allow: /
User-agent: Claude-SearchBot
Allow: /
User-agent: anthropic-ai
Allow: /
# Perplexity
User-agent: PerplexityBot
Allow: /
User-agent: Perplexity-User
Allow: /
# Google
User-agent: Google-Extended
Allow: /
User-agent: Googlebot
Allow: /
# Apple
User-agent: Applebot
Allow: /
User-agent: Applebot-Extended
Allow: /
# Meta
User-agent: Meta-ExternalAgent
Allow: /
# Common Crawl
User-agent: CCBot
Allow: /
Sitemap: https://example.com/sitemap.xml
Three things to notice:
- Each AI bot gets its own explicit
User-agentblock. TheUser-agent: *default is the lowest-priority match — if you only have a wildcard, some crawlers silently default to “blocked” because their internal lookup didn’t find a specific stanza. Disallow: /api/is fine and recommended. APIs aren’t useful to crawlers, and it keeps them out of expensive endpoints.- The
Sitemap:directive tells crawlers where to find your XML sitemap. AI crawlers honor it the same way Googlebot does.
Common mistakes that cost you AEO points
Mistake 1: Blocking GPTBot and assuming you’ve blocked ChatGPT
You haven’t. GPTBot is the training crawler. The bot that actually fetches your URL when a ChatGPT user asks a question is ChatGPT-User, and the search index that powers “search within ChatGPT” is OAI-SearchBot. If you only block GPTBot, OpenAI can still read and cite your site live.
If you actually want to opt out of all OpenAI surfaces, you need three blocks:
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: OAI-SearchBot
Disallow: /
But — to be clear — most sites should not block these. You’re opting out of the second-largest source of qualified traffic on the internet.
Mistake 2: Wildcarding everything but forgetting AI bots default to wildcard
Some robots.txt files do this:
User-agent: *
Allow: /
User-agent: SemrushBot
Disallow: /
That’s fine — but only because SemrushBot is explicitly listed and inherits its own policy. If you don’t list any AI bots, they fall through to User-agent: * and inherit Allow: /. That’s the good outcome and what we recommend.
The bug appears when someone writes:
User-agent: *
Disallow: /
User-agent: Googlebot
Allow: /
Now ChatGPT-User, Claude-User, and PerplexityBot are all blocked because they fall through to the wildcard. This is the single most common AEO failure we see. The fix is to add explicit Allow: blocks for every bot you want to permit.
Mistake 3: Cloudflare’s “AI Bots Block” toggle
Cloudflare’s dashboard has a one-click “Block AI Bots” feature. It blocks via WAF rules at the edge, not via robots.txt. So your robots.txt can be perfect and your site can still be invisible to AI search.
If you’ve ever clicked that toggle, go to Cloudflare → Bots → Configure AI Bots and turn it off (or selectively unblock the ones you want).
Mistake 4: Returning a 200 with empty body for User-agent: GPTBot
Some old WordPress plugins do this — they return a 200 with no Disallow lines because the file template was misconfigured. That’s an implicit allow, but several crawlers treat an empty file as a parse error and skip your site. Make sure your robots.txt has at least one User-agent block.
How to verify
Three checks, each takes 30 seconds:
- Visit
https://yoursite.com/robots.txtin a browser. Confirm it loads, returns plain text, and contains explicit blocks for the bots you care about. - Run our AEO Site Checker. The
robots_ai_botscheck tests all 12 bots and tells you which are blocked, which are allowed, and which fell through to the wildcard. - Check your server logs. If
ChatGPT-User,Claude-User, orPerplexityBothaven’t hit your domain in the last 30 days, something is filtering them — usually a WAF rule, notrobots.txt.
A note on robots.txt as a contract
robots.txt is not a security boundary. It’s a request to well-behaved crawlers. Every major AI operator (OpenAI, Anthropic, Perplexity, Google) honors it. If you need to actually block a bot — say it’s overloading your server — robots.txt is the right first step, but you should also rate-limit at the edge.
The opposite is also true: a permissive robots.txt won’t help you if your WAF blocks the request before it arrives.
Further reading
- What is /llms.txt and how to write one
- Why AI crawlers don’t run JavaScript
- OpenAI’s bot documentation
- Anthropic’s bot documentation
- Perplexity’s bot documentation
Ready to score your site? Run an audit →