The Evaluation Supply Chain for AI Agents in 2026
Every AI agent in production depends on an evaluation supply chain: human-labelled examples for fine-tuning, automated eval frameworks for regression testing, and RLHF (reinforcement learning from human feedback) infrastructure for ongoing model improvement. The companies that supply this chain are some of the most heavily funded in the AI sector, and the category has consolidated rapidly through 2025-2026. This page consolidates the major eval and RLHF startups, their funding, and their market positioning.
Data Labelling and RLHF Service Providers
| Company | Funding | Valuation / Notes |
|---|---|---|
| Scale AI | ~$1.6B cumulative (Meta acquired 49% for $14.3B, June 2025) | ~$29B Meta-implied valuation; primary labelling provider for OpenAI, Anthropic, Meta |
| Surge AI | Bootstrapped; reported revenue $1B+ in 2024 | Anthropic primary labelling partner; bootstrapped against Scale's VC-funded model |
| Toloka | Owned by Yandex; growing AI-eval business | Major non-Scale alternative for European and global labelling |
| Mercor | $100M+ cumulative | AI-marketplace-for-experts model; rapid growth in 2025-2026 |
| Snorkel AI | ~$135M cumulative | Programmatic labelling; enterprise focus |
Evaluation Framework and Eval-as-a-Service Startups
| Company | Funding | Focus |
|---|---|---|
| Patronus AI | ~$50M cumulative | Hallucination + factuality evaluation; FT Series B |
| Galileo | ~$45M cumulative | RAG + agent evaluation focus |
| Confident AI (DeepEval) | ~$5M cumulative | Open-source DeepEval framework + cloud |
| Braintrust | $120M cumulative ($800M valuation) | Eval framework + observability integrated (covered separately) |
| Argilla | Acquired by Hugging Face Q4 2024 | Open-source labelling and eval; HF-bundled |
| LightOn | ~$25M cumulative | European eval and fine-tuning platform |
Six Things the Eval-Startup Landscape Tells You
- Meta's $14.3B Scale AI acquisition reset the category. Meta took 49 percent of Scale AI in June 2025 at an implied $29 billion valuation, the largest single AI-data-services deal in history. The deal locked up Scale's capacity for Meta and forced competing labs (OpenAI, Anthropic, Google) to diversify labelling providers, expanding the addressable market for Surge, Mercor, and Toloka.
- Surge AI's bootstrapped path is now the cautionary success story. Reported $1B+ revenue in 2024 without venture capital validates the "pure-services not platform" approach for labelling. Anthropic is the primary customer. Whether Surge stays bootstrapped through scale is the structural question for 2026-2027.
- Patronus AI is the funded eval-specialist leader. $50M cumulative funding focused on hallucination and factuality evaluation. The thesis: as agents are deployed in regulated and high-stakes settings, evaluation against hallucination becomes structural, not optional.
- Mercor scaled fastest in 2025-2026. The "AI marketplace for experts" positioning (matching domain experts to labelling and RLHF tasks) is differentiated from the Scale / Surge mass-labelling approach. Mercor's growth reflects the shift toward higher-quality, lower-volume labelling for frontier model fine-tunes.
- Hugging Face's Argilla acquisition opens up eval-bundling. Argilla acquired by HF in Q4 2024 means HF can offer integrated labelling and eval inside its model hosting platform. Competitors (Snorkel, LightOn) now must differentiate against an HF-bundled option.
- Open-source eval frameworks have caught on. DeepEval (Confident AI's framework), Argilla, and LangSmith's eval functionality are all open or freely available. Cloud commercial tiers monetise on top of open-source foundations. The pattern echoes observability: free open-source + paid cloud.
What This Means for AI Visibility
Eval and labelling startups themselves rarely appear in consumer AI visibility tracking, but they are critical to two B2B segments: AI labs (OpenAI, Anthropic, Meta, Google) and AI-native companies running their own fine-tunes. Brands selling into either segment (security, billing, dev tooling, cloud) should track visibility within these companies' buyer profiles. The eval-supply-chain layer is small in pure-revenue terms but has outsized influence on which models exist and which capabilities they have, which in turn shapes downstream brand-visibility outcomes for everyone.
Methodology
Funding and acquisition data collected May 15, 2026 from Crunchbase, PitchBook, Reuters and Financial Times coverage of the Scale-Meta deal, and vendor websites. Revenue figures where reported are vendor self-disclosures and should be treated as directional. Refreshed quarterly.
How Presenc AI Helps
Presenc AI tracks brand visibility inside AI labs and AI-native companies' buyer demographics. For brands selling into the eval-supply-chain segment specifically, the buyer universe is concentrated (~10-50 companies globally) and visibility inside that universe is the operational signal that connects pipeline investment to deal flow.