The State of AI GPU Supply in 2026
AI GPU supply moved from acute shortage in 2023 to functional balance in 2026 as NVIDIA Blackwell ramped at scale and AMD MI300X plus Google TPU v6 added competing capacity. Cloud rental prices fell sharply over 2024-2025; on-prem economics remain attractive for sustained workloads. This page consolidates pricing and supply data through Q2 2026.
Key Findings
- NVIDIA H100 cloud rental rates fell from approximately $8/hr in early 2023 to $1.80-3.50/hr in Q2 2026, with spot pricing as low as $1.20/hr.
- NVIDIA B200 (Blackwell) rental rates in Q2 2026 are approximately $4.50-7.00/hr; supply is constrained but not allocated.
- GB200 NVL72 rack-scale systems remain in tight allocation; access is primarily through hyperscaler clouds and select neoclouds.
- NVIDIA H100 lead times for direct purchase are approximately 6-12 weeks in Q2 2026, down from 50+ weeks in 2023.
- AMD MI300X rental pricing is approximately 30-40 percent below H100 at comparable performance for inference; software ecosystem (ROCm) remains the differentiator.
NVIDIA H100 Cloud Rental Pricing Trajectory
| Period | On-demand $/hr (single H100) | Spot $/hr (single H100) | Notable provider rates |
|---|---|---|---|
| Q1 2023 | ~$8.00 | ~$5.00 | Allocation-constrained; long waitlists |
| Q1 2024 | ~$5.00 | ~$3.20 | Lambda, Crusoe, CoreWeave |
| Q1 2025 | ~$3.20 | ~$2.00 | Spot supply ramp |
| Q2 2026 | ~$1.80-3.50 | ~$1.20-2.00 | B200 supply pressures H100 |
Pricing aggregated from Lambda Cloud, CoreWeave, Crusoe, AWS p5, Google Cloud public rate cards.
NVIDIA B200 / GB200 Pricing
| SKU | On-demand $/hr | Notes |
|---|---|---|
| B200 (single) | ~$4.50-7.00 | Limited but growing availability |
| HGX B200 8-GPU | ~$36-55 | Premium for tightly-coupled inference |
| GB200 NVL72 (per GPU equivalent) | ~$8-14 | Tight allocation; hyperscaler-mediated |
| H200 (single) | ~$3.00-4.50 | Bridge between H100 and B200 |
AMD and Alternative Accelerator Pricing
| Accelerator | On-demand $/hr | Comparable to |
|---|---|---|
| AMD MI300X | ~$1.20-2.50 | ~30-40% below H100; competitive on inference |
| AMD MI325X | ~$1.80-3.20 | ~20-30% below H200 |
| Google TPU v5p (per chip) | ~$2.50-4.50 | GCP-only; competitive on training |
| AWS Trainium 2 | ~$0.80-1.80 | AWS-only; cost-leader on inference |
| Cerebras WSE-3 (cloud) | premium pricing | Niche use cases; very high single-chip throughput |
| Groq LPU (inference) | per-token pricing | Inference-only; extreme tps on small models |
Cloud Provider Comparison (H100, on-demand 8-GPU box)
| Provider | Approximate on-demand $/hr | Notes |
|---|---|---|
| Lambda Cloud | ~$22-26 | Among lowest; AI-focused |
| CoreWeave | ~$24-32 | Reliable allocation; strong for training |
| Crusoe | ~$22-28 | Sustainable energy positioning |
| Together AI / Anyscale | ~$28-36 | Managed services premium |
| AWS p5.48xlarge | ~$98 (on-demand list) | Reservations and savings plans bring effective rate down |
| GCP A3 (8x H100) | ~$88 (on-demand list) | Significant discount with commitments |
| Azure NDH100v5 | ~$98 (on-demand list) | Significant discount with reservations |
Lead Times for Direct Purchase
| SKU | Lead time Q2 2026 | Lead time peak (2023) |
|---|---|---|
| H100 SXM | 6-12 weeks | 50+ weeks |
| H100 PCIe | 4-8 weeks | 30+ weeks |
| H200 | 8-14 weeks | n/a |
| B200 | 16-26 weeks | n/a (allocated) |
| GB200 NVL72 | allocation-only | n/a |
On-Prem vs Rental Economics
For sustained workloads at moderate-to-high utilisation, on-prem H100 amortises favourably against cloud rental within 6-14 months. Beyond utilisation rate, the decision depends on:
- Capital availability for upfront purchase
- Datacentre space, power, cooling availability
- Engineering team to operate the cluster
- Workload predictability (rental wins for spiky loads)
- Need for newest-generation hardware (rental upgrades automatically)
Brand Visibility Implications
GPU economics are heavily journalist-covered, particularly cost-per-token math, GPU shortage / oversupply narratives, and cloud-pricing wars. Brands selling GPU cloud, AI accelerator competitors to NVIDIA, AI cost-optimisation services, or compute-marketplace platforms face high AI-mediated discovery surface as buyers query AI assistants for cost-efficient compute recommendations. Hyperscaler GPU services and neocloud providers compete heavily on AI-mediated visibility for "cheapest H100 cloud" type queries.
Methodology
Pricing aggregated from public rate cards: Lambda Cloud, CoreWeave, Crusoe, AWS, GCP, Azure. Lead times triangulated from NVIDIA reseller channel reports and procurement-team interviews. Spot pricing reflects monitored neocloud spot markets. Updated monthly as the market remains fast-moving.
How Presenc AI Helps
Presenc AI tracks brand-mention rates inside AI assistant queries about GPU cloud pricing, AI accelerator selection, and compute-marketplace comparison, the surface where compute purchasing decisions increasingly originate. For brands selling AI compute or AI cost-optimisation, this is the operational visibility into a high-stakes commercial discovery surface.