Nvidia's real monopoly isn't the chips — it's CUDA, their GPU programming platform built since 2006. Virtually all AI training code runs on CUDA. AMD's MI300X can match H100 on paper, but rewriting models for AMD's ROCm takes months of engineering. That switching cost is structural.
After ChatGPT launched, H100 GPUs (at $25K–$40K each) sold out globally for 12+ months. Microsoft, Google, Amazon, and Meta collectively spent over $200B on AI infrastructure in 2024 — nearly all of it on Nvidia hardware.
B200 draws 1,000 watts per chip. A rack of 8 consumes as much electricity as a small house. As AI clusters scale to 100,000+ GPUs, power grid capacity becomes the bottleneck — not chip supply. Microsoft's Three Mile Island nuclear deal was directly driven by AI data center power demand.
MI300X (current) holds 192GB HBM3 — 2.4× more than H100. The MI455X (H2 2026) doubles down: 432GB HBM4 at 19.6 TB/s, 40 PFLOPS FP4, CDNA 5 on TSMC 2nm. That's 2.25× more memory than even the Rubin R100. AMD's memory lead is structural — HBM stacking is where they're competing, not raw CUDA cores.
H100: ~$30/TFLOP. B200: ~$18/TFLOP. MI300X: ~$11/TFLOP. As of mid-2026, you can run trillion-parameter inference workloads for under $0.50/hour on commodity cloud — a figure that was science fiction just 3 years ago.
Nvidia's Rubin R100 ships H2 2026: 50 PFLOPS FP4 (~12,500 TFLOPS FP16 est.), 288GB HBM4 at 22 TB/s, 336 billion transistors on TSMC 3nm. That's 5.6× the compute of B200 in a single generation — the largest jump since Volta. Rubin Ultra follows in 2027.
Announced at Computex 2026, RTX Spark is Nvidia's first PC SoC: 20-core ARM Grace CPU (built with MediaTek) + Blackwell GPU with 6,144 CUDA cores + up to 128GB unified LPDDR5X memory on TSMC 3nm. It delivers 1 PFLOP FP4 AI performance, runs 120B-parameter models locally, and launches in fall 2026 on ASUS, Dell, HP, and Microsoft hardware. Intel dropped 6%, AMD 5% on the news.