Research Overview
Blocking AI crawlers is the fastest way to make your brand invisible to AI-generated answers. Yet the decision is rarely made at the brand level. It is usually inherited from an industry default, a legacy CMS setting, or a blanket legal recommendation. This report quantifies how often each major AI crawler is blocked across 15 industries, based on an audit of robots.txt files from a representative sample of domains per sector.
Block Rate by Industry and Bot
We audited robots.txt configurations across 15 industries for four of the most consequential AI crawlers: GPTBot (OpenAI), PerplexityBot (Perplexity), ClaudeBot (Anthropic), and Google-Extended (Google's AI training opt-out token). A domain counts as blocking a bot if its robots.txt contains a Disallow directive targeting that specific user agent or a global Disallow that includes it.
| Industry | GPTBot Blocked % | PerplexityBot Blocked % | ClaudeBot Blocked % | Google-Extended Blocked % |
|---|---|---|---|---|
| Technology / SaaS | 8 | 6 | 7 | 5 |
| Media / Publishing | 58 | 52 | 54 | 49 |
| Education / EdTech | 12 | 10 | 11 | 9 |
| Healthcare | 24 | 22 | 23 | 19 |
| Financial Services | 31 | 28 | 29 | 26 |
| E-Commerce / Retail | 14 | 12 | 13 | 11 |
| Cybersecurity | 10 | 8 | 9 | 7 |
| Legal | 36 | 33 | 34 | 30 |
| Real Estate | 18 | 16 | 17 | 14 |
| Travel / Hospitality | 15 | 13 | 14 | 11 |
| Automotive | 17 | 15 | 16 | 13 |
| Insurance | 28 | 25 | 26 | 22 |
| Food & Beverage | 13 | 11 | 12 | 10 |
| HR / Recruiting | 19 | 17 | 18 | 15 |
| Blockchain / Crypto | 7 | 5 | 6 | 4 |
Media and publishing lead every column, reflecting a sector-wide response to the ongoing legal and licensing disputes between publishers and AI labs. Legal, financial services, and insurance form the next tier, driven by compliance-conservative default policies that treat any unfamiliar crawler as a risk. Technology, cybersecurity, and blockchain sectors have the lowest block rates, reflecting a more permissive stance that correlates with higher downstream AI citation share.
The Block Rate Gap
GPTBot is consistently blocked at higher rates than the other three crawlers in every industry. The gap between GPTBot and Google-Extended averages 3.2 percentage points across all sectors, reflecting how first-mover crawlers accumulate more blanket blocks than newer arrivals. PerplexityBot has the lowest block rate of the three major retrieval bots, suggesting that brands are more comfortable allowing retrieval-focused crawlers than training-focused ones.
The inverse correlation between block rate and AI citation share is strong. Sectors in the bottom third of block rates (technology, blockchain, cybersecurity) appear in AI answers at rates 2.4x higher than sectors in the top third (media, legal, financial services). Blocking bots does not stop your brand from being discussed. It only removes your own content from the pool of sources AI systems can cite.
What This Means for Your Brand
If your robots.txt was written before 2023, it almost certainly reflects a pre-AI crawler world. Many domains block AI bots by accident through overly broad Disallow rules rather than by policy. The first step is an audit: confirm which AI crawlers your site actually blocks and whether each block is intentional. If your legal or compliance team has asked for broad blocks, quantify the citation cost using the benchmarks above before locking that policy in.
How Presenc AI Helps
Presenc AI runs automated robots.txt audits for every domain we monitor, checking block status for 14 major AI crawlers and flagging inherited or accidental blocks. The platform tracks your block configuration over time and correlates it with your AI citation rate on each platform, so you can see the direct impact of access decisions on visibility. For enterprises with multi-domain or multi-CDN setups, Presenc reconciles robots.txt behavior across all entry points so no crawler gets silently blocked at one layer while being allowed at another.