robots.txt for AI crawlers — the complete guide

If your robots.txt blocks the wrong AI crawler, you can disappear from ChatGPT search results overnight — even if your SEO is otherwise pristine.

This guide is the working list of AI crawlers we audit against, organized by who matters and which file each operator publishes their bot list in.

The 12 AI crawlers we check

The AEO Site Checker probes robots.txt against every one of these. Seven are “critical” — blocking them removes you from the corresponding AI surface entirely.

Bot	Operator	Purpose	Critical?
`GPTBot`	OpenAI	Training data crawl
`ChatGPT-User`	OpenAI	Live fetch when ChatGPT browses for an answer	✓
`OAI-SearchBot`	OpenAI	ChatGPT Search index	✓
`ClaudeBot`	Anthropic	Training data crawl
`Claude-User`	Anthropic	Live fetch when Claude browses	✓
`Claude-SearchBot`	Anthropic	Claude search index	✓
`PerplexityBot`	Perplexity	Search index	✓
`Perplexity-User`	Perplexity	Live fetch on user query	✓
`Google-Extended`	Google	Gemini training + AI Overviews	✓
`Applebot-Extended`	Apple	Apple Intelligence training
`Meta-ExternalAgent`	Meta	Meta AI training
`CCBot`	Common Crawl	Public web archive used by many models

The rule of thumb: the *-User and *-SearchBot bots are the ones that decide whether you get cited live. Training bots (GPTBot, ClaudeBot) only affect future model versions; users won’t notice if you block them today.

The single allowlist that works for almost everyone

Drop this into public/robots.txt (or wherever your static files are served from):

# Default policy
User-agent: *
Allow: /
Disallow: /api/
Disallow: /admin/

# OpenAI
User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: OAI-SearchBot
Allow: /

# Anthropic
User-agent: ClaudeBot
Allow: /

User-agent: Claude-User
Allow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: anthropic-ai
Allow: /

# Perplexity
User-agent: PerplexityBot
Allow: /

User-agent: Perplexity-User
Allow: /

# Google
User-agent: Google-Extended
Allow: /

User-agent: Googlebot
Allow: /

# Apple
User-agent: Applebot
Allow: /

User-agent: Applebot-Extended
Allow: /

# Meta
User-agent: Meta-ExternalAgent
Allow: /

# Common Crawl
User-agent: CCBot
Allow: /

Sitemap: https://example.com/sitemap.xml

Three things to notice:

Each AI bot gets its own explicit User-agent block. The User-agent: * default is the lowest-priority match — if you only have a wildcard, some crawlers silently default to “blocked” because their internal lookup didn’t find a specific stanza.
Disallow: /api/ is fine and recommended. APIs aren’t useful to crawlers, and it keeps them out of expensive endpoints.
The Sitemap: directive tells crawlers where to find your XML sitemap. AI crawlers honor it the same way Googlebot does.

Common mistakes that cost you AEO points

Mistake 1: Blocking `GPTBot` and assuming you’ve blocked ChatGPT

You haven’t. GPTBot is the training crawler. The bot that actually fetches your URL when a ChatGPT user asks a question is ChatGPT-User, and the search index that powers “search within ChatGPT” is OAI-SearchBot. If you only block GPTBot, OpenAI can still read and cite your site live.

If you actually want to opt out of all OpenAI surfaces, you need three blocks:

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

But — to be clear — most sites should not block these. You’re opting out of the second-largest source of qualified traffic on the internet.

Mistake 2: Wildcarding everything but forgetting AI bots default to wildcard

Some robots.txt files do this:

User-agent: *
Allow: /

User-agent: SemrushBot
Disallow: /

That’s fine — but only because SemrushBot is explicitly listed and inherits its own policy. If you don’t list any AI bots, they fall through to User-agent: * and inherit Allow: /. That’s the good outcome and what we recommend.

The bug appears when someone writes:

User-agent: *
Disallow: /

User-agent: Googlebot
Allow: /

Now ChatGPT-User, Claude-User, and PerplexityBot are all blocked because they fall through to the wildcard. This is the single most common AEO failure we see. The fix is to add explicit Allow: blocks for every bot you want to permit.

Mistake 3: Cloudflare’s “AI Bots Block” toggle

Cloudflare’s dashboard has a one-click “Block AI Bots” feature. It blocks via WAF rules at the edge, not via robots.txt. So your robots.txt can be perfect and your site can still be invisible to AI search.

If you’ve ever clicked that toggle, go to Cloudflare → Bots → Configure AI Bots and turn it off (or selectively unblock the ones you want).

Mistake 4: Returning a 200 with empty body for `User-agent: GPTBot`

Some old WordPress plugins do this — they return a 200 with no Disallow lines because the file template was misconfigured. That’s an implicit allow, but several crawlers treat an empty file as a parse error and skip your site. Make sure your robots.txt has at least one User-agent block.

How to verify

Three checks, each takes 30 seconds:

Visit https://yoursite.com/robots.txt in a browser. Confirm it loads, returns plain text, and contains explicit blocks for the bots you care about.
Run our AEO Site Checker. The robots_ai_bots check tests all 12 bots and tells you which are blocked, which are allowed, and which fell through to the wildcard.
Check your server logs. If ChatGPT-User, Claude-User, or PerplexityBot haven’t hit your domain in the last 30 days, something is filtering them — usually a WAF rule, not robots.txt.

A note on `robots.txt` as a contract

robots.txt is not a security boundary. It’s a request to well-behaved crawlers. Every major AI operator (OpenAI, Anthropic, Perplexity, Google) honors it. If you need to actually block a bot — say it’s overloading your server — robots.txt is the right first step, but you should also rate-limit at the edge.

The opposite is also true: a permissive robots.txt won’t help you if your WAF blocks the request before it arrives.

robots.txt for AI crawlers — the complete guide

The 12 AI crawlers we check

The single allowlist that works for almost everyone

Common mistakes that cost you AEO points

Mistake 1: Blocking `GPTBot` and assuming you’ve blocked ChatGPT

Mistake 2: Wildcarding everything but forgetting AI bots default to wildcard

Mistake 3: Cloudflare’s “AI Bots Block” toggle

Mistake 4: Returning a 200 with empty body for `User-agent: GPTBot`

How to verify

A note on `robots.txt` as a contract

Further reading

Run an audit on this advice.

robots.txt for AI crawlers — the complete guide

The 12 AI crawlers we check

The single allowlist that works for almost everyone

Common mistakes that cost you AEO points

Mistake 1: Blocking GPTBot and assuming you’ve blocked ChatGPT

Mistake 2: Wildcarding everything but forgetting AI bots default to wildcard

Mistake 3: Cloudflare’s “AI Bots Block” toggle

Mistake 4: Returning a 200 with empty body for User-agent: GPTBot

How to verify

A note on robots.txt as a contract

Further reading

Run an audit on this advice.

Mistake 1: Blocking `GPTBot` and assuming you’ve blocked ChatGPT

Mistake 4: Returning a 200 with empty body for `User-agent: GPTBot`

A note on `robots.txt` as a contract