Fetchability
Cloudflare/WAF challenges, 403 walls, JS-only SPAs, robots.txt that locks out OAI-SearchBot or Perplexity-User. If LLM crawlers can't load your page, nothing else matters.
A free auditor that scores any URL 0–100 on whether ChatGPT, Claude, Perplexity, and Google AI Overviews can crawl, parse, and cite it. 27 weighted checks, run in seconds.
ChatGPT alone reaches more than 800 million weekly users. When an AI picks 1–3 sources to cite, it picks from sites it can actually crawl and parse — not just whatever ranks #1.
Generative engines reshape how visibility is earned online — well-cited, statistic-rich, quotation-rich content sees a substantial boost in citation rate, even from sources that don't rank in the top 5 of traditional search.
Aggarwal et al., GEO: Generative Engine Optimization · Princeton, 2024
Every audit runs the same weighted checks across five signal categories that determine whether AI engines can find, parse, and cite your site.
Cloudflare/WAF challenges, 403 walls, JS-only SPAs, robots.txt that locks out OAI-SearchBot or Perplexity-User. If LLM crawlers can't load your page, nothing else matters.
Title length, meta description, canonical URL, OpenGraph/Twitter, sitemap.xml, html lang. The traditional SEO foundation that AI search still expects.
One h1, no skipped heading levels, header / nav / main / article / footer landmarks, image alt text. The shape AI uses to parse your page.
llms.txt manifest, JSON-LD (Article, Organization, FAQ, Person, LocalBusiness), Mozilla Readability extraction, author byline + sameAs links, dateModified freshness.
Princeton's GEO study found front-loaded answers, statistics density (+41%) and quotations (+28%) are what actually move the needle in generative answers. We check for them.
GPTBot, ChatGPT-User, OAI-SearchBot, Claude-User, Claude-SearchBot, PerplexityBot, Perplexity-User, Google-Extended, Applebot-Extended — we check robots.txt against every one.
We always try a vanilla direct fetch first. If your site responds with a Cloudflare
challenge, 403, or other WAF block, we retry through BrightData's Web Unlocker so we
can still grade the page — but you lose the major fetch_direct
credit, because that's exactly what an AI crawler would also fail on.
Users no longer scroll through 10 blue links — they get a synthesized answer with 1–3 citations. Whether yours is one of them depends on signals you can actually fix.
/llms.txt manifest, robots.txt allowlists for AI crawlers
(OAI-SearchBot, Claude-User, PerplexityBot,
Google-Extended), JSON-LD structured data, and content patterns from
Princeton's GEO research (front-loaded answers, statistics, quotations).
Read the full AEO primer → undici, the same
way a real LLM crawler would. If your site needs JavaScript to render content, it
fails the ssr_content check — that's intentional, because AI crawlers
also don't run JS reliably.
Why AI crawlers don't run JS → GPTBot, ChatGPT-User, OAI-SearchBot,
ClaudeBot, Claude-User, Claude-SearchBot,
PerplexityBot, Perplexity-User, Google-Extended,
Applebot-Extended, Meta-ExternalAgent, and CCBot.
The complete robots.txt for AI crawlers
guide → /llms.txt is a proposed convention for a markdown file at your site root
that gives AI agents a curated index of your most important pages — like a sitemap, but
optimized for LLMs to read. It's worth 5 of the 100 points in this audit.
/llms-full.txt is an optional full-content dump worth 1 bonus point.
How to write an llms.txt → Free. No signup. Results in seconds.