AI Infrastructure Pricing & Memory Digest

🇨🇳 中文版

Bilingual structured index of 39 long-form videos on AI infrastructure pricing, memory (HBM/DRAM), bottlenecks, and inference economics. Built from Vertex Gemini extraction; every claim cites a specific [video_id @ timestamp]. No fabricated numbers.

The Headline Thesis

Inference cost fell 1000× in 3 years. Memory just became the bottleneck that decides whether that drop continues.

  • AI inference query cost: $400 → $0.40 per million tokens for GPT-4-class output (early 2023 → March 2026)
  • Simultaneously, 30% of Big Tech’s 2026 capex is going to memory, DRAM prices have tripled, and HBM capacity is sold out through 2026

Two forces, opposite directions. The 1000× chart sets the demand curve. The memory crunch sets the supply ceiling.

The Corpus

Category Videos Total tokens
Pricing & Economics (long-form deep dives) 5 1.56 M
Memory & HBM 5 0.91 M
Expert Interviews (BG2 / Dwarkesh / 20VC / All-In) 21 4.40 M
Acquired Deep Dives (NVIDIA I/II/III · TSMC · Morris Chang · Jensen) 6 2.56 M
China Perspective (Bilibili) 2 0.08 M
Total 39 9.5 M

Top Findings

1. Memory Crunch — the headline numbers

Metric Value Source
Big Tech 2026 capex going to memory 30% Dylan Patel · mDG_Hx3BSUE @ 1:23:11
DRAM price (commodity) $3-4/GB → $12/GB mDG_Hx3BSUE @ 1:24:23
iPhone 12GB memory BOM $50 → $150 mDG_Hx3BSUE @ 1:24:12-26
Implied iPhone consumer price impact +$250 mDG_Hx3BSUE @ 1:24:51
Smartphone 2026 unit forecast 1.1 B → 500-600 M (possible) mDG_Hx3BSUE @ 1:25:31
SK Hynix HBM market share ~70% BV1SbeAzfE37

2. HBM technical reality

Spec HBM4 stack DDR5 channel Ratio
Interface width 2048 bits 64-128 bits ~20×
Bandwidth ~2.5 TB/s ~128 GB/s ~20×
Source mDG_Hx3BSUE @ 1:21:07-50 same  

3. The 1000× inference cost timeline

Model $/M input tokens Notes
GPT-4 (early 2023) $400 as query
GPT-4-class (March 2026) $0.40 1000× drop
Claude Opus 4.1 → 4.6 $15 → $5 67% cut, one gen
Gemini 2.5 Pro $1.25 Google premium
DeepSeek V3 $0.14 input / $0.28 output 1/20 of GPT-4 launch
Claude Haiku $0.25 Approaching DB query cost

Source: KvoD4nu6H08 @ 00:00-01

4. Bottleneck Hierarchy (Dylan Patel’s framework)

Rank Bottleneck Why Dylan’s Verdict
1 ASML EUV tools ~70/yr → ~100/yr by 2030 = ~700 tools = ~200 GW chip capacity ceiling “Ultimate”
2 HBM / memory bandwidth Physical fab space short-term, wafer area long-term “Supply constraint”
3 CoWoS advanced packaging TSMC throughput Joint with HBM
4 Data center construction labor Solved by modularization Solvable
5 Power Grid has idle capacity, “behind-the-meter” gas turbines Not the bottleneck

5. China differentiation (from Bilibili)

Chinese-language sources surface alternatives that English-language discourse rarely covers:

  • Huawei UCM (Unified Cache Manager): TTFT -90%, throughput +22× via tiered storage vs HBM-only
  • Saimemory (SoftBank + Intel + UTokyo): new stacked DRAM, 40-50% power reduction, prototype 2027
  • NEO Semiconductor X-HBM: 16× bandwidth, 10× density target, 512 Gbit per chip
  • 3D X-DRAM: vertical stacking breaks 2D scaling limits
  • Samsung Z-NAND revival: 15× perf, 80% power cut

Sourcing & Verification

  • Items flagged for verification (do NOT publish without checking):
    • “NVIDIA paid Grok $20 billion” → likely Groq (LPU company), not Grok (xAI). Transcription artifact.
    • “Gemini Flashlight” → almost certainly Flash or Flash Lite, mis-transcribed.

Browse

Each video page has: segments with YouTube-linked timestamps, specific prices table, memory facts, bottleneck claims, predictions, key technologies, companies mentioned, notable quotes, and takeaways.

Pipeline

  • Extraction: Vertex AI Gemini 2.5 Pro / 3.1 Pro Preview, native YouTube file_uri ingestion
  • Bilibili: yt-dlp 480p → inline blob upload
  • Long videos (>2 h): chunked via start_offset / end_offset
  • Structured JSON schema enforced via response_mime_type=application/json

License

CC BY 4.0 for indexed notes content. Original videos remain the property of their creators.