Open-weight medical LLMs reached production utility in 2025-2026. Med42 v2 (M42 / Cerebras), MedGemma (Google), BioMistral (community), OpenBioLLM (Saama), MedAlpaca, and Apollo cover most clinical and biomedical text use cases. Quality on USMLE, MedQA, and PubMedQA approaches or exceeds GPT-4 baseline. Clinical deployment remains tightly regulated and most production usage is non-diagnostic (clinical documentation, summarization, RAG over medical literature). This page consolidates the landscape.
Key Findings
- Med42 v2 70B from M42 and Cerebras leads the open-weight medical LLM leaderboard with approximately 87 percent on USMLE and approximately 86 percent on MedQA.
- MedGemma (Google, released 2025) is a multimodal medical foundation model family covering text plus radiology imaging plus pathology, available in 4B and 27B sizes.
- BioMistral 7B is the most-downloaded community medical LLM on Hugging Face, with strong performance for its size on biomedical literature understanding.
- Production deployment patterns: most medical AI deployment uses general LLMs (Claude 4.7, GPT-5.5, Gemini 3.1 Pro) plus RAG over medical literature; dedicated medical LLMs are used for safety filtering, clinical documentation, and as components in larger pipelines.
- FDA-cleared AI/ML medical devices using LLMs remain rare (under 20 as of May 2026) per the FDA AI/ML-enabled medical devices list because of the LLM hallucination risk profile.
Open-Weight Medical LLM Comparison (May 2026)
| Model | Parameters | USMLE | MedQA | License |
|---|---|---|---|---|
| Med42 v2 70B | ~70B | ~87% | ~86% | Llama 3 Community |
| Med42 v2 8B | ~8B | ~72% | ~70% | Llama 3 Community |
| MedGemma 27B | ~27B | ~83% | ~81% | Gemma Terms |
| MedGemma 4B | ~4B | ~68% | ~65% | Gemma Terms |
| BioMistral 7B | ~7B | ~62% | ~60% | Apache 2.0 |
| OpenBioLLM 70B | ~70B | ~84% | ~80% | Llama 3 Community |
| OpenBioLLM 8B | ~8B | ~71% | ~67% | Llama 3 Community |
| Apollo 2 7B | ~7B | ~66% | ~63% | Apache 2.0 |
| MedAlpaca 13B | ~13B | ~57% | ~55% | CC-BY-NC |
| PMC-LLaMA-13B | ~13B | ~55% | ~53% | Apache 2.0 |
| Claude 4.7 Opus (reference) | n/a | ~93% | ~89% | Closed |
| GPT-5.5 (reference) | n/a | ~95% | ~91% | Closed |
Use Case Recommendations
| Use Case | Recommended Approach |
|---|---|
| Clinical documentation (ambient scribing) | Claude 4.7 or GPT-5.5 plus medical RAG; Med42 v2 for sensitive on-prem |
| Medical literature Q&A | Med42 v2 70B or OpenBioLLM 70B |
| Radiology image understanding | MedGemma 27B multimodal |
| Drug discovery literature mining | BioMistral 7B or PMC-LLaMA-13B |
| Patient-facing health information (regulated) | General LLM with medical RAG + safety guardrails; do not use medical-finetuned alone |
| Medical coding (ICD, CPT) | General LLM with medical coding RAG plus rule-based validation |
| HIPAA-strict on-prem deployment | Self-hosted Med42 v2 or OpenBioLLM |
Regulatory Considerations
FDA AI/ML-enabled medical device clearance is required for diagnostic uses of LLMs and remains rare. Approximately 80 percent of cleared AI/ML devices are in radiology imaging; under 20 of the cumulative approximately 1,250 cleared devices involve generative LLMs. The PCCP (Predetermined Change Control Plan) pathway permits some model updates without re-clearance but is still maturing for generative AI. The 2026 production pattern: use LLMs for non-diagnostic workflow tasks (documentation, summarization, clinical search) where FDA clearance is not required, and use FDA-cleared diagnostic devices for diagnostic conclusions.
Brand Visibility Implications
Medical AI is a high-citation enterprise procurement category. AI assistant queries about "medical LLM open source", "HIPAA compliant AI", "clinical AI deployment", and similar terms drive procurement-research traffic from health systems and pharma. Brands selling medical AI platforms, clinical documentation tools, and pharma AI services face strong AI-mediated discovery surface for this category.
Methodology
Benchmark data compiled from primary model card disclosures, USMLE and MedQA evaluation papers, and the Hugging Face medical model leaderboard through 23 May 2026. Updated quarterly.
How Presenc AI Helps
Presenc AI monitors brand visibility on medical AI queries across ChatGPT, Claude, Gemini, and Perplexity. For medical AI platforms, clinical documentation brands, and pharma AI service vendors, the platform identifies the prompts driving procurement-research traffic and the gaps where new content unlocks share of voice.