Relaunch: AI visibility and crawler exposure audit. Full fix package is $19.
Perplexity and edge controls

PerplexityBot WAF Checklist

A practical PerplexityBot checklist for robots.txt, WAF rules, CDN challenges, public content exposure and AI answer-source readiness.

When to use this

Use this when a site appears allowed in robots.txt but AI crawlers may still be blocked by security tooling or edge policy.

Robots.txt is only one layer

Perplexity says PerplexityBot uses robots.txt directives, but real deployments also depend on CDN bot controls, WAF challenges, rate limits and cache behavior.

User-agent: PerplexityBot
Allow: /

Sitemap: https://example.com/sitemap.xml

What the WAF should prove

For public pages, the edge should serve crawlable HTML without CAPTCHA or JavaScript challenge. For private pages, the edge should enforce real authentication rather than relying on robots.txt.

  • No managed challenge on public marketing pages
  • No 403 for known crawler user agents that you intentionally allow
  • Private, staging and customer pages require authentication
  • Logs distinguish allowed AI search crawlers from blocked scraping patterns

Checklist

Fetch the page with normal browser headers and crawler-style headers.

Inspect CDN/WAF logs for 403, 429, challenge and bot-score events.

Keep robots.txt readable even if other paths are protected.

Do not expose private URLs in sitemap.xml, public links or JavaScript bundles.

Document which AI bots are allowed, blocked or monitored.

FAQ

What does PerplexityBot WAF Checklist check first?

Fetch the page with normal browser headers and crawler-style headers.

Does this guarantee ranking or inclusion in AI answers?

No. It checks public technical signals that can make a page easier to crawl, parse and cite, but no tool can guarantee ranking, indexing or citation in ChatGPT, Claude, Perplexity or Google.

Should I fix robots.txt, llms.txt or page rendering first?

Fix public reachability, indexability and readable initial HTML first. robots.txt should express crawler policy, and llms.txt is optional supporting documentation rather than a replacement for normal search fundamentals.

Next step

Use AIO Checker as the public-facing audit, then confirm WAF behavior in your CDN logs.

Scan your site