Research

AI GPU Supply and Pricing 2026

AI GPU supply, lead times, and rental pricing in 2026: H100, H200, B200, GB200, RTX 5090. Cloud rental rates from Lambda Labs, CoreWeave, Crusoe, and on-prem economics.

By Ramanath, CTO & Co-Founder at Presenc AI · Last updated: May 2026

The State of AI GPU Supply in 2026

AI GPU supply moved from acute shortage in 2023 to functional balance in 2026 as NVIDIA Blackwell ramped at scale and AMD MI300X plus Google TPU v6 added competing capacity. Cloud rental prices fell sharply over 2024-2025; on-prem economics remain attractive for sustained workloads. This page consolidates pricing and supply data through Q2 2026.

Key Findings

  1. NVIDIA H100 cloud rental rates fell from approximately $8/hr in early 2023 to $1.80-3.50/hr in Q2 2026, with spot pricing as low as $1.20/hr.
  2. NVIDIA B200 (Blackwell) rental rates in Q2 2026 are approximately $4.50-7.00/hr; supply is constrained but not allocated.
  3. GB200 NVL72 rack-scale systems remain in tight allocation; access is primarily through hyperscaler clouds and select neoclouds.
  4. NVIDIA H100 lead times for direct purchase are approximately 6-12 weeks in Q2 2026, down from 50+ weeks in 2023.
  5. AMD MI300X rental pricing is approximately 30-40 percent below H100 at comparable performance for inference; software ecosystem (ROCm) remains the differentiator.

NVIDIA H100 Cloud Rental Pricing Trajectory

PeriodOn-demand $/hr (single H100)Spot $/hr (single H100)Notable provider rates
Q1 2023~$8.00~$5.00Allocation-constrained; long waitlists
Q1 2024~$5.00~$3.20Lambda, Crusoe, CoreWeave
Q1 2025~$3.20~$2.00Spot supply ramp
Q2 2026~$1.80-3.50~$1.20-2.00B200 supply pressures H100

Pricing aggregated from Lambda Cloud, CoreWeave, Crusoe, AWS p5, Google Cloud public rate cards.

NVIDIA B200 / GB200 Pricing

SKUOn-demand $/hrNotes
B200 (single)~$4.50-7.00Limited but growing availability
HGX B200 8-GPU~$36-55Premium for tightly-coupled inference
GB200 NVL72 (per GPU equivalent)~$8-14Tight allocation; hyperscaler-mediated
H200 (single)~$3.00-4.50Bridge between H100 and B200

AMD and Alternative Accelerator Pricing

AcceleratorOn-demand $/hrComparable to
AMD MI300X~$1.20-2.50~30-40% below H100; competitive on inference
AMD MI325X~$1.80-3.20~20-30% below H200
Google TPU v5p (per chip)~$2.50-4.50GCP-only; competitive on training
AWS Trainium 2~$0.80-1.80AWS-only; cost-leader on inference
Cerebras WSE-3 (cloud)premium pricingNiche use cases; very high single-chip throughput
Groq LPU (inference)per-token pricingInference-only; extreme tps on small models

Cloud Provider Comparison (H100, on-demand 8-GPU box)

ProviderApproximate on-demand $/hrNotes
Lambda Cloud~$22-26Among lowest; AI-focused
CoreWeave~$24-32Reliable allocation; strong for training
Crusoe~$22-28Sustainable energy positioning
Together AI / Anyscale~$28-36Managed services premium
AWS p5.48xlarge~$98 (on-demand list)Reservations and savings plans bring effective rate down
GCP A3 (8x H100)~$88 (on-demand list)Significant discount with commitments
Azure NDH100v5~$98 (on-demand list)Significant discount with reservations

Lead Times for Direct Purchase

SKULead time Q2 2026Lead time peak (2023)
H100 SXM6-12 weeks50+ weeks
H100 PCIe4-8 weeks30+ weeks
H2008-14 weeksn/a
B20016-26 weeksn/a (allocated)
GB200 NVL72allocation-onlyn/a

On-Prem vs Rental Economics

For sustained workloads at moderate-to-high utilisation, on-prem H100 amortises favourably against cloud rental within 6-14 months. Beyond utilisation rate, the decision depends on:

  • Capital availability for upfront purchase
  • Datacentre space, power, cooling availability
  • Engineering team to operate the cluster
  • Workload predictability (rental wins for spiky loads)
  • Need for newest-generation hardware (rental upgrades automatically)

Brand Visibility Implications

GPU economics are heavily journalist-covered, particularly cost-per-token math, GPU shortage / oversupply narratives, and cloud-pricing wars. Brands selling GPU cloud, AI accelerator competitors to NVIDIA, AI cost-optimisation services, or compute-marketplace platforms face high AI-mediated discovery surface as buyers query AI assistants for cost-efficient compute recommendations. Hyperscaler GPU services and neocloud providers compete heavily on AI-mediated visibility for "cheapest H100 cloud" type queries.

Methodology

Pricing aggregated from public rate cards: Lambda Cloud, CoreWeave, Crusoe, AWS, GCP, Azure. Lead times triangulated from NVIDIA reseller channel reports and procurement-team interviews. Spot pricing reflects monitored neocloud spot markets. Updated monthly as the market remains fast-moving.

How Presenc AI Helps

Presenc AI tracks brand-mention rates inside AI assistant queries about GPU cloud pricing, AI accelerator selection, and compute-marketplace comparison, the surface where compute purchasing decisions increasingly originate. For brands selling AI compute or AI cost-optimisation, this is the operational visibility into a high-stakes commercial discovery surface.

Frequently Asked Questions

On-demand cloud rental is approximately $1.80-3.50/hr per H100 in Q2 2026, with spot pricing as low as $1.20/hr. Lambda Cloud, CoreWeave, and Crusoe lead the price-aggressive segment. Hyperscaler list pricing (AWS p5, GCP A3, Azure NDH100v5) is materially higher but reservations and commitments bring effective rates down.
Functional balance, not shortage by 2026. NVIDIA H100 lead times are 6-12 weeks (down from 50+ weeks in 2023). B200 is supply-constrained but not allocation-only outside the largest hyperscalers. GB200 NVL72 remains in tight allocation. The acute 2023-era shortage has resolved.
For sustained moderate-to-high utilisation workloads, buy and amortise. Breakeven is 6-14 months on H100 at moderate utilisation. For spiky or experimental workloads, rent. Hybrid (small on-prem for baseline, cloud for spikes) is the practical default for most enterprises with sustained AI workloads.
On inference, yes, comparable performance at 30-40 percent lower price. On training, software ecosystem (ROCm vs CUDA) remains the differentiator, ROCm has improved materially in 2025-2026 but CUDA retains the developer ergonomics edge. AMD adoption is growing fastest on inference workloads.
Mid-to-late 2026 for general cloud availability outside the largest hyperscalers. NVIDIA Blackwell ramp continues through 2026 with GB200 NVL72 systems concentrated at the largest hyperscalers and select neoclouds. By H1 2027, B200 access should be similar to H100 access today.

Track Your AI Visibility

See how your brand appears across ChatGPT, Claude, Perplexity, and other AI platforms. Start monitoring today.