Twelve months ago, the combined global usage share of DeepSeek and Qwen was about 1%. By January 2026, it was roughly 15%. That is the fastest adoption curve in AI history. And almost none of it shows up in the dashboards marketing teams use to monitor AI visibility.
When most brands talk about getting recommended by AI, they mean ChatGPT, Gemini, Claude, and Perplexity. Those four are the visible ocean. Below the waterline, a second AI internet has been quietly forming, built on Llama 4, Qwen 3.6, DeepSeek V4, GLM-5.1, Gemma 4, and Mistral Small 4. It powers thousands of consumer apps, internal enterprise copilots, RAG systems behind support portals, and on-device assistants that never call a public API.
Whatever you optimized for the closed-model leaderboards is invisible there. And that blind spot is getting bigger, not smaller.
The new map of open-weight AI
As of April 2026, six labs ship open-weight models that match or beat the closed alternatives on practical workloads. A year ago, exactly one open model (the original Llama 3) was in that conversation.
Meta's Llama 4 Scout has a 10 million token context window, the longest of any production model. Alibaba's Qwen 3.6-35B-A3B hits 73.4% on SWE-bench Verified with only 3 billion active parameters, which means it runs on a single consumer GPU. DeepSeek V4 leads raw coding benchmarks at 83.7% SWE-bench Verified and offers cache-hit pricing as low as $0.07 per million input tokens, roughly ten times cheaper than the closed labs. Google's Gemma 4, Zhipu's GLM-5.1, and Mistral Small 4 fill out the rest.
Meanwhile Hugging Face now hosts more than 2 million models, 500,000 datasets, and 1 million demo apps. 92.5% of model downloads are for sub-1B parameter models, which means they are being run locally and embedded into products, not called over an API. The mean downloaded model size jumped from 827M parameters in 2023 to 20.8B in 2025, driven by quantization and mixture-of-experts. Open weights are not a research curiosity anymore. They are the substrate of an entire production layer of AI.
Why open-source LLMs see your brand differently
The closed and open ecosystems both start from Common Crawl. 64% of the 47 LLMs Mozilla analyzed used at least one filtered version of it, and for GPT-3 over 80% of training tokens came from Common Crawl. So far, so similar. The August 2025 crawl alone added 2.42 billion pages.
What diverges is what each model does with that raw data. Closed labs run private quality classifiers and layer in curated proprietary data on top. Open models like Llama 4, Qwen, and DeepSeek lean on public filtered datasets such as RedPajama-V2 (30 trillion tokens across 84 Common Crawl dumps with 40+ pre-computed quality annotations), FineWeb, and Dolma. These public pipelines are aggressive about deduplication and quality scoring, which is good for model performance but punishing for brands whose web presence is thin or lives mostly on their own domain.
Common Crawl itself is not a representative sample of the web. Its crawler prioritizes domains that are heavily linked to, so Facebook, Google, YouTube, and Wikipedia dominate the graph and long-tail industry sources get sparse coverage. Mozilla's report notes that domains tied to digitally marginalized communities are the least likely to be included at all. When public filtering pipelines layer on top of that bias, brands that are well known to ChatGPT can vanish from open-weight model recall entirely.
Then there is quantization. The 4-bit and 8-bit quantized versions of Llama 4 and Qwen that ship into production apps trade fidelity for speed. The first capability to degrade is long-tail entity recall, which is exactly where most B2B brands live. Your brand might be in the full-precision weights and silently absent from the version a developer actually deployed.
The self-hosted enterprise blind spot
Gartner reported a 340% jump in enterprise private LLM development through 2025. 65% of Fortune 500 companies have deployed LLM-based engagement tools. 44% of organizations cite data privacy as the top barrier to using public AI, which is the exact wedge that pushes them toward self-hosted Llama, Qwen, and Mistral deployments. The enterprise LLM market is projected to grow from $6.85B in 2025 to $55.6B by 2032 (30% CAGR), and a large share of that spend is going into models that never touch an external API.
From a GEO perspective, this is a category change, not an incremental one. When a Fortune 500 buyer asks their internal procurement copilot which vendors to shortlist, that copilot is often a fine-tuned Llama or Qwen running inside the corporate firewall. There is no API to monitor. No prompt logs to scrape. No public ranking to track. If your brand was not in the base model and is not in the RAG corpus the company assembled, you are not on the shortlist. And the buyer has no idea you exist.
This is the visibility surface where mid-market and enterprise B2B deals get won and lost in 2026. Every AI visibility platform on the market today, including ours, is structurally blind to it.
The Chinese model layer is its own story
Qwen alone holds roughly 12% of global LLM usage. DeepSeek's growth chart goes near-vertical from January 2025. GLM-5.1 from Zhipu is increasingly default for Chinese SaaS startups. These models are open-weight and being downloaded everywhere, but their training corpora over-index on Chinese-language web content, Baidu Baike instead of Wikipedia, Zhihu instead of Reddit, and a different press ecosystem entirely.
For Western brands, the implication is uncomfortable. A US SaaS company with strong Wikipedia and TechCrunch presence will get reliably recommended by ChatGPT and Claude. The same brand asked about in Mandarin via a Qwen-powered app may not surface at all, even if it has Chinese paying customers. For brands with real Asia-Pacific revenue, this is not a hypothetical. It is a missed pipeline you cannot see in your dashboards.
RAG flips the visibility math
Most production deployments of open-weight models are not bare. They sit behind a retrieval layer. Documents get embedded, queries hit a vector database, and the LLM synthesizes an answer from retrieved chunks plus its parametric knowledge. The closed-model conversation about "training data presence" matters less here. What matters is whether your content was ingested into the RAG index.
That sounds like an opportunity, and it is, partially. If a developer wires your docs into their internal copilot, you get cited every time. But the practical pattern is that companies index their own internal docs, the public docs of incumbents they already trust, and a curated knowledge base. New entrants are absent from both the model weights and the RAG corpus. The asymmetry compounds.
The brands that win this layer ship their content in formats that get pulled into RAG indexes by default. That means clean Markdown documentation, public API references with structured schemas, llms.txt files, and content that is easy to chunk and embed. If your top-of-funnel page is a JavaScript-rendered marketing site with no direct factual claims, you are invisible to retrieval too.
What to actually do about it
1. Test your visibility on at least one open model directly. Run Qwen 3.6 or Llama 4 against your category prompts via Hugging Face Inference, Together AI, Groq, or Fireworks. The cost is trivial and the signal is real. If your brand surfaces on ChatGPT but not on Llama 4, you have a training-data gap, not a hallucination problem.
2. Audit your presence in the public sources that public filtering pipelines preserve. Wikipedia is not optional. Crunchbase, GitHub READMEs that describe what you do (not what you sell), structured Schema.org markup, and at least one canonical entry on each of the major review aggregators in your category. These are the sources RedPajama, FineWeb, and Dolma are biased toward keeping.
3. Publish for retrieval. Maintain a comprehensive plain-Markdown documentation site, expose llms.txt and llms-full.txt, and structure your highest-value pages so they survive chunking. If you only have one set of marketing pages and they are JS-rendered, you are double-blind on the open layer.
4. Earn citations in non-Western sources if you sell internationally. A single Zhihu post or Baidu Baike entry does for Qwen what a TechCrunch piece does for ChatGPT. The same logic applies to NAVER for Korean models and to Yandex Wordstat data for the few Russian-language deployments that still ingest it.
5. Get into open-source code itself. Models like DeepSeek and Qwen are heavily trained on permissively licensed GitHub. A widely-used open-source SDK or integration with a popular framework is one of the most durable forms of presence in coding-focused open models. It also signals to the model that you are an entity that matters in a developer context, which is what most B2B buyers are.
6. Accept that the self-hosted enterprise layer will stay opaque, and use first-party signals. Track inbound traffic for AI user agents you do see (PerplexityBot, GPTBot, ClaudeBot), monitor support tickets and sales calls for "I asked our internal AI" mentions, and instrument your demo flow for unusual referral patterns. Self-hosted copilots cannot be queried, but their downstream behavior leaves traces.
The closed-model leaderboard is the easy half
ChatGPT and Gemini are the surface where AI visibility gets discussed because they are the surface where it can be measured. The harder half is below. It is six open-weight model families, hundreds of forks and fine-tunes, thousands of internal enterprise deployments, and consumer apps in markets you do not natively monitor. As of April 2026, that half is at least a third of all AI inference and growing faster than the closed half.
The brands that recognize this early treat training-data presence as a portfolio play across closed and open ecosystems. The brands that do not will keep optimizing for the leaderboards and wonder why their pipeline numbers do not move.
Curious how your brand looks to the open layer?
Presenc AI tracks brand visibility across both the closed models everyone monitors and the open-weight models almost no one does. See where you show up, where you do not, and which sources are carrying you in each.


