Do AI crawlers actually respect robots.txt?

Major AI platforms publicly commit to respecting robots.txt: OpenAI, Anthropic, Perplexity, Google, Mistral. Smaller and less-established crawlers may not. Your robots.txt is effective for the platforms that matter most for brand visibility.

What is the difference between GPTBot and OAI-SearchBot?

GPTBot is OpenAI's training crawler (feeding future ChatGPT model releases). OAI-SearchBot is the live retrieval crawler used by SearchGPT. Blocking one does not block the other. Most brands want to allow both.

Does blocking GPTBot affect ChatGPT visibility?

Indirectly yes. Blocking GPTBot prevents future ChatGPT training from learning new information about your brand from your site. Your current ChatGPT presence depends on already-captured training snapshots. Fresh content you publish after the block will not reach future ChatGPT releases through your own pages.

Should I use wildcards for AI user-agents?

No. User-agent wildcards (like User-agent: *) apply to every crawler including search engines. Use specific user-agent names for AI crawlers to keep your policy precise and your traffic analysis clean.

robots.txt Template for AI Crawlers, Free Starter

What This Template Gives You

A ready-to-deploy robots.txt configuration that explicitly allows every major AI crawler by name in 2026, plus separate templates for the other two common postures: selective blocking (allow some AI crawlers, block others) and full opt-out. Pick the template that matches your policy, customize, and deploy.

Explicit is better than implicit. Even if your policy is to allow every crawler, writing explicit Allow directives for each by name protects you against CMS updates, plugin changes, or template inheritance accidentally introducing blocks.

Template 1: Permissive (Recommended for Most Brands)

# robots.txt for AI crawlers
# Permissive configuration: allow all major AI crawlers

# OpenAI
User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

# Anthropic
User-agent: ClaudeBot
Allow: /

User-agent: Claude-Web
Allow: /

User-agent: anthropic-ai
Allow: /

# Perplexity
User-agent: PerplexityBot
Allow: /

User-agent: Perplexity-User
Allow: /

# Google
User-agent: Google-Extended
Allow: /

# Apple
User-agent: Applebot-Extended
Allow: /

# Mistral
User-agent: MistralAI-User
Allow: /

# DeepSeek
User-agent: DeepSeekBot
Allow: /

# Alibaba (Qwen)
User-agent: QwenBot
Allow: /

# Meta
User-agent: meta-externalagent
Allow: /

User-agent: FacebookBot
Allow: /

# Common Crawl (feeds many LLM training sets)
User-agent: CCBot
Allow: /

# Sitemap
Sitemap: https://yoursite.com/sitemap.xml

Template 2: Selective (Allow Retrieval, Block Training)

Use this template if your policy is to allow AI assistants to retrieve your pages at inference time (for citations and links) but block large-scale training crawls.

# robots.txt for AI crawlers
# Selective: allow retrieval, block training

# Allow retrieval bots (used when AI answers live queries)
User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Perplexity-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: MistralAI-User
Allow: /

# Block training bots
User-agent: GPTBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: FacebookBot
Disallow: /

# Sitemap
Sitemap: https://yoursite.com/sitemap.xml

Template 3: Full Opt-Out (Block All AI)

Use this template only if you have a specific policy reason to block all AI crawlers. Blocking all AI crawlers removes your brand from AI-generated answer pools, which is usually a net negative for visibility.

# robots.txt for AI crawlers
# Full opt-out: block all AI

User-agent: GPTBot
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Perplexity-User
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: MistralAI-User
Disallow: /

User-agent: DeepSeekBot
Disallow: /

User-agent: QwenBot
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: CCBot
Disallow: /

# Sitemap
Sitemap: https://yoursite.com/sitemap.xml

Deployment Checklist

Choose your template: start with Template 1 unless you have a specific reason to restrict.
Customize the Sitemap line: point to your actual sitemap URL.
Audit for conflicts: if your existing robots.txt has broad Disallow rules, they may conflict with these specific Allow rules. Specific rules typically take precedence but behavior varies by crawler.
Deploy to your domain root: serve at https://yoursite.com/robots.txt with content-type text/plain.
Verify deployment: request robots.txt from a fresh browser tab and confirm the expected content.
Monitor crawler traffic: check server logs over the following weeks for increased or decreased activity from the crawlers you configured.

Common Pitfalls

WordPress plugin conflicts: some WordPress SEO plugins generate their own robots.txt and override static files. If your deployed robots.txt does not match what you uploaded, a plugin is likely intercepting. Disable plugin-based robots.txt generation or configure the plugin to match your template.

CDN and edge rules: Cloudflare, Vercel, and other edge providers can inject their own bot-management rules. Ensure your CDN is not blocking AI crawlers after your robots.txt allows them.

User-agent case sensitivity: some crawlers match user-agent strings case-insensitively, some do not. Use the canonical casing shown in each template.

Conflicting Allow and Disallow rules: if a crawler appears in both an Allow and a Disallow block, behavior is inconsistent. Keep rules for each user-agent in a single block.

How Presenc AI Helps

Presenc AI audits your robots.txt on every monitored domain, flagging accidental blocks, outdated user-agent names, conflicts, and missing declarations. The platform correlates your robots.txt posture with measured AI crawler activity and citation outcomes, showing the direct impact of access configuration on visibility. For teams migrating CMS or CDN, Presenc watches for unintended robots.txt changes during the transition.

robots.txt Template for AI Crawlers