AI Crawler Exposure Guide
Exposure control is about reducing accidental crawler access to public content. It is not a replacement for authentication, authorization or private storage.
Robots.txt is voluntary
Cooperative crawlers can honor robots.txt, but robots.txt is not a security boundary. Sensitive content should require authentication.
Training bots and search bots are different
Blocking GPTBot is not the same as blocking OAI-SearchBot. Training, search and user-triggered agents can have different user agents and different product effects.
Snippet controls matter
noindex, nosnippet, max-snippet and X-Robots-Tag can reduce how content appears in search surfaces. Use them deliberately on pages where visibility is not the goal.
Review public leaks
Check sitemap entries, public links and indexed pages for staging, admin, private docs or customer-specific URLs before launch.
Use edge controls for policy
Cloudflare and other CDNs can add bot controls and WAF rules, but public fetches cannot reliably prove those settings. Treat them as a deployment checklist item.