Multilingual open-weight LLMs covering non-English languages reached production utility for most major languages in 2026. Aya Expanse, SEA-LION 3, AfroLLM, Sabia, Indic-LLM (and various Indian-language LLMs), Latxa (Basque), FinGPT (Finnish), and dozens of language-specific finetunes cover most commercially-significant languages outside English. This page consolidates the landscape.
Key Findings
- Aya Expanse from Cohere remains the leading general-purpose multilingual open-weight family, with strong coverage across 23 languages including Arabic, Hindi, Chinese, Japanese, Korean, and the major European languages.
- SEA-LION 3 from AI Singapore is the leading model for Southeast Asian languages (Bahasa Indonesia, Thai, Vietnamese, Tagalog, Tamil-MY, Burmese, Khmer, Lao, Mandarin-SG).
- Indic-LLM, OpenHathi, BharatGPT, and various Indian-language fine-tunes cover the major Indic languages (Hindi, Tamil, Telugu, Bengali, Marathi, Gujarati, Kannada, Malayalam, Punjabi).
- AfroLLM and Lelapa AI\u2019s InkubaLM cover African languages, an under-served segment historically; quality is improving but lags major-language models.
- Sovereign AI alignment is driving multilingual investment: UAE Falcon-Arabic, Saudi ALLaM, India Hindi-focused models, Vietnam VinaLLaMA, France Lucie (CroissantLLM successor) are all government-supported or sovereign-aligned multilingual programmes.
Major Open-Weight Multilingual LLMs (May 2026)
| Model | Languages | Parameters | License |
|---|---|---|---|
| Aya Expanse 32B | 23 languages | ~32B | CC-BY-NC + Commercial |
| Aya Expanse 8B | 23 languages | ~8B | CC-BY-NC + Commercial |
| Aya-101 13B | 101 languages (research) | ~13B | Apache 2.0 |
| Qwen2.5 / Qwen3 (multilingual) | ~119 languages | 0.5B-235B | Apache 2.0 / Tongyi |
| SEA-LION 3 | 11 SEA languages | 3B, 7B, 27B | MIT |
| SeaLLM (Sea-LION precursor) | 9 SEA languages | varies | Apache 2.0 |
| InkubaLM (Lelapa AI) | 5 African languages | ~0.4B | Apache 2.0 |
| AfroLLM | 22 African languages | varies | Apache 2.0 |
| OpenHathi (Sarvam) | Hindi | ~7B | Apache 2.0 |
| BharatGPT | 14 Indian languages | varies | Various |
| Sarvam-1 / Sarvam-2B | Indic languages | ~2B | Apache 2.0 |
| Sabia 3 (Portuguese) | Brazilian Portuguese | ~varies | Maritaca-licence |
| Latxa (Basque) | Basque | ~7B, 70B | Llama 3 Community |
| FinGPT 7B (Finnish) | Finnish | ~7B | Apache 2.0 |
| Falcon-Arabic 11B | Arabic | ~11B | Apache 2.0 |
| Jais 70B (Arabic) | Arabic and English | ~70B | Apache 2.0 |
| ALLaM (Saudi) | Arabic | varies | SDAIA |
| VinaLLaMA (Vietnamese) | Vietnamese | ~7B | Apache 2.0 |
| Lucie (France) | French and European | ~7B | Apache 2.0 |
Language Coverage by Region
| Region | Leading Open-Weight Options |
|---|---|
| Arabic and MENA | Aya Expanse, Falcon-Arabic, Jais, ALLaM, Qwen3 |
| Indic languages | OpenHathi, Sarvam-1, BharatGPT, Indic-LLM, Qwen3 |
| Southeast Asia | SEA-LION 3, SeaLLM, VinaLLaMA, Qwen3 |
| African languages | AfroLLM, InkubaLM (limited coverage) |
| European languages | Aya Expanse, Mistral family, Lucie, Latxa, Qwen3 |
| Brazilian Portuguese | Sabia 3, Aya Expanse, Llama 3.x |
| Chinese | Qwen3, ChatGLM, Yi, InternLM, Baichuan, Hunyuan |
| Japanese | Qwen3, Llama 3.x with JP finetunes, Karakuri, ELYZA |
| Korean | SOLAR, Qwen3, Llama 3.x with Korean finetunes, KULLM |
Strategic Context
Three patterns shape the 2026 multilingual landscape. First, large general models (Qwen3, Aya Expanse, Llama 4) cover the major commercial languages well; specialised regional models retain advantage for under-resourced languages and cultural-context-heavy workloads. Second, sovereign AI investment is driving regional model development: UAE, Saudi Arabia, India, Vietnam, France, and others have government-supported multilingual LLM programmes. Third, the language gap remains real for under-resourced African and South Asian languages where pretraining data scarcity limits model quality.
Brand Visibility Implications
Multilingual AI is a fast-growing procurement category for global brands. AI assistant queries about "Arabic LLM", "Indic language AI", "Southeast Asia AI", and similar terms drive procurement-research traffic from regional brands and global product teams. Brands selling localised AI products, multilingual customer service AI, and translation services face strong AI-mediated discovery surface for this category.
Methodology
Model data compiled from primary Hugging Face model card disclosures, AI Singapore, Cohere For AI, and regional sovereign AI programme announcements through 23 May 2026. Updated quarterly.
How Presenc AI Helps
Presenc AI monitors brand visibility on multilingual AI queries across ChatGPT, Claude, Gemini, and Perplexity. For localised AI product vendors, multilingual customer service AI brands, and translation services, the platform identifies the prompts driving procurement-research traffic and the gaps where new content unlocks share of voice.