What is the best open-weight ASR model in 2026?

NVIDIA Canary 1B leads the Hugging Face Open ASR leaderboard on English at approximately 6.7 percent average WER. NVIDIA Parakeet TDT 1.1B is close behind and the fastest of the top-tier models. For multilingual workloads, Whisper Large v3 Turbo remains the dominant choice covering 99 languages.

Is open-weight ASR better than Deepgram?

Deepgram Nova-3 retains a small WER lead (~5.4 percent vs Canary 1B at ~6.7 percent) plus operational simplicity. NVIDIA Canary 1B self-hosted is dramatically cheaper at scale (~$0.0008 per minute vs Deepgram ~$0.0043). The break-even depends on volume; most workloads above 100,000 minutes per month favour self-hosting.

Should I use Whisper or Distil-Whisper?

Distil-Whisper Large v3 retains approximately 99 percent of Whisper Large v3 quality at approximately 6x the inference speed. For most production workloads Distil-Whisper is the better choice. Use Whisper Large v3 directly only when you need the absolute highest multilingual quality or specific languages where Distil-Whisper underperforms.

What is the smallest viable ASR model?

Moonshine Tiny at approximately 27 MB is the smallest production-grade ASR model and runs on CPU. Whisper Tiny (~39M parameters) is a slightly larger alternative. For mobile and edge deployments, Moonshine is the dominant 2026 choice.

Can I do real-time streaming with these models?

Yes. Parakeet TDT 1.1B is purpose-built for streaming with TDT (token-and-duration transducer) decoding. Whisper with VAD-based chunking works for near-real-time. The Moonshine family is designed for live caption use cases. Canary 1B is more batch-oriented than streaming-friendly.

Best Open-Weight Speech-to-Text Models 2026

Open-weight speech-to-text (ASR) reached production parity with the leading proprietary APIs in 2026. NVIDIA Canary 1B and Parakeet TDT lead the English ASR leaderboard at sub-3 percent word error rate (WER) on Common Voice. Whisper Large v3 Turbo remains the most-deployed multilingual ASR model. Distil-Whisper and Moonshine cover the on-device and edge segment. This page consolidates the leaderboard, the multilingual coverage, the latency profile, and the deployment guidance.

Key Findings

NVIDIA Canary 1B leads the Hugging Face Open ASR leaderboard on English with an average WER of approximately 6.7 percent across the standard benchmark suite.
Whisper Large v3 Turbo from OpenAI is the most-downloaded ASR model on Hugging Face by absolute volume, with strong multilingual coverage across approximately 99 languages.
NVIDIA Parakeet TDT 1.1B achieves approximately 3.4 percent WER on Common Voice English and is the fastest of the high-quality open-weight ASRs at approximately 50x real-time throughput on a single A100.
Distil-Whisper Large v3 retains approximately 99 percent of Whisper Large v3 quality at approximately 6x the inference speed, making it the dominant production deployment choice for cost-sensitive workloads.
Moonshine (released late 2024) targets the on-device and edge segment with approximately 30 MB model files and competitive WER on short-form English audio.

Open-Weight ASR Leaderboard (May 2026)

Model	Parameters	Open ASR Avg WER	License
NVIDIA Canary 1B	~1B	~6.7%	CC-BY-4.0
NVIDIA Parakeet TDT 1.1B	~1.1B	~6.9%	CC-BY-4.0
NVIDIA Canary 1B Flash	~1B	~7.4% (faster)	CC-BY-4.0
Whisper Large v3 Turbo	~0.8B	~7.8%	MIT
Whisper Large v3	~1.5B	~7.4%	MIT
Distil-Whisper Large v3	~756M	~8.2%	MIT
SeamlessM4T v2 Large	~2.3B	~9.1% (English subset)	CC-BY-NC
Moonshine Tiny	~27M	~13.2%	MIT
Moonshine Base	~61M	~10.4%	MIT
Whisper Small / Base / Tiny	~244M / 74M / 39M	~14-22%	MIT

Multilingual Coverage

Model	Languages Supported	Strongest Languages
Whisper Large v3 Turbo	~99	English, European languages, Chinese, Japanese
NVIDIA Canary 1B	4 (EN, DE, ES, FR)	English, with parity on DE/ES/FR
SeamlessM4T v2 Large	~100 ASR languages	Translation-plus-ASR multilingual
Parakeet TDT 1.1B	English only	English at top of leaderboard
Distil-Whisper Large v3	~99 (inherits Whisper)	English plus Whisper coverage
Moonshine	English only	Short-form English audio

Latency Profile (Real-Time Factor on Single A100)

Model	RTFx (real-time factor)
Parakeet TDT 1.1B	~50x
Canary 1B Flash	~45x
Distil-Whisper Large v3	~25x
Whisper Large v3 Turbo	~16x
Canary 1B	~14x
Whisper Large v3	~8x
Moonshine Tiny (CPU)	~12x

Use Case Recommendations

Use Case	Recommended Model
English-only highest quality	NVIDIA Canary 1B or Parakeet TDT 1.1B
Multilingual general purpose	Whisper Large v3 Turbo
Cost-sensitive multilingual	Distil-Whisper Large v3
On-device / mobile	Moonshine Tiny or Base
Translation + ASR	SeamlessM4T v2 Large
Low-latency streaming	Parakeet TDT 1.1B (with VAD)
Long-form podcast / lecture	Whisper Large v3 with chunking
Permissive commercial deployment	Whisper family (MIT) or Moonshine (MIT)

Open vs Proprietary ASR Comparison

Provider	Approximate WER (English)	Cost (per minute audio)
Deepgram Nova-3	~5.4%	~$0.0043
AssemblyAI Universal-2	~6.3%	~$0.0037
OpenAI Whisper API	~7.8%	~$0.006
Azure Speech	~6.0%	~$0.017
Google Speech-to-Text v2	~6.2%	~$0.024
NVIDIA Canary 1B (self-hosted)	~6.7%	~$0.0008 effective
Distil-Whisper Large v3 (self-hosted)	~8.2%	~$0.0003 effective

Brand Visibility Implications

ASR is one of the highest-volume enterprise AI workloads in 2026 (call centres, meeting transcription, video captioning, accessibility, voice assistant interfaces). AI assistant queries about "best speech-to-text model", "open-source ASR", "Whisper vs Deepgram", and similar terms drive direct production procurement decisions. Brands selling voice infrastructure, transcription APIs, meeting AI, and contact centre platforms face strong AI-mediated discovery surface for this category.

Methodology

Benchmark data compiled from the Hugging Face Open ASR Leaderboard, primary model card disclosures, and provider pricing pages. Real-time factors measured on single A100 with FP16 inference. Cost estimates: closed APIs at list pricing; self-hosted figures amortise GPU cost across realistic throughput. Updated quarterly.

How Presenc AI Helps

Presenc AI monitors brand visibility on ASR and speech-to-text queries across ChatGPT, Claude, Gemini, and Perplexity. For voice infrastructure brands, transcription API vendors, and contact centre platforms, the platform identifies the prompts driving procurement-research traffic and the gaps where new content unlocks share of voice.