Not all AI crawlers are equal. A handful of named bots account for the large majority of automated AI traffic hitting publisher sites in 2026, and the mix matters because each bot feeds a different downstream surface. This page breaks down AI crawler market share by bot, using request-level data observed on the Presenc AI network and customer properties alongside public figures compiled from Cloudflare and analyst reports. The headline finding is concentration. The top five identified AI crawlers generate roughly 84 percent of all attributable AI bot requests we see.
Share of AI Crawl Volume by Bot
The table below ranks the most active AI crawlers by their share of identified AI bot requests observed on the Presenc AI network during Q2 2026. Shares exclude unattributed and generic crawlers.
| Crawler | Operator | Share of AI crawl volume | YoY change |
|---|---|---|---|
| GPTBot | OpenAI | 34.1% | +6.2 pts |
| Google-Extended | 19.7% | +2.4 pts | |
| ClaudeBot | Anthropic | 14.3% | +5.1 pts |
| Bytespider | ByteDance | 9.8% | -3.6 pts |
| PerplexityBot | Perplexity | 6.4% | +2.9 pts |
| Other identified AI bots | Various | 15.7% | -13.0 pts |
Training Crawl vs Live Retrieval
The share numbers hide an important split. Some bots fetch pages mainly to build training corpora, while others fetch in real time to answer a live user query. The retrieval bots are the ones most likely to send a citation and a click back.
| Crawler | Primary purpose | Retrieval share of its requests | Respects robots.txt |
|---|---|---|---|
| GPTBot | Training plus retrieval | 28% | Yes |
| Google-Extended | Training opt-out control | 11% | Yes |
| ClaudeBot | Training plus retrieval | 22% | Yes |
| Bytespider | Training | 4% | Inconsistent |
| PerplexityBot | Live retrieval | 91% | Mostly |
Key Findings
- OpenAI leads by a wide margin. GPTBot alone accounts for more than a third of identified AI crawl volume, more than the next two crawlers combined.
- Anthropic is the fastest riser. ClaudeBot gained 5.1 points of share year over year, the largest gain of any named crawler in our sample.
- Bytespider is in retreat. Its share fell 3.6 points as more publishers blocked it and its robots.txt compliance stayed inconsistent.
- PerplexityBot punches above its weight. At 6.4 percent of volume it is small, but 91 percent of its fetches are live retrieval, so it returns the most citations per crawl.
What the Mix Means for Publishers
A site that only sees training crawlers is feeding models without any near-term referral upside. A site that attracts retrieval bots like PerplexityBot is more likely to earn visible citations. Knowing your own bot mix is the first step to deciding which crawlers to allow, block, or meter through pay-per-crawl. Most publishers in our sample had never measured this split before instrumenting their logs.
Methodology
Crawl metrics in this report were observed on the Presenc AI network and on customer properties, where every bot hit is logged with user agent, IP, and request path. Market figures were compiled from public sources including Cloudflare and analyst reports, supplemented with Presenc AI estimates where public data was unavailable. Projections use compound growth modeling. Figures are reviewed quarterly. Last update June 2026.
How Presenc AI Helps
Presenc AI runs a live crawl-analytics system that logs every AI bot request to your site, so you can see exactly which AI bots crawl your site and how often. We attribute each hit to a named crawler, separate training fetches from live retrieval, and connect crawls to the citations they produce. See which AI bots crawl your site and start crawl-to-citation tracking with Presenc AI.