AI Infrastructure Pricing & Memory Digest
Bilingual structured index of 39 long-form videos on AI infrastructure pricing, memory (HBM/DRAM), bottlenecks, and inference economics. Built from Vertex Gemini extraction; every claim cites a specific
[video_id @ timestamp]. No fabricated numbers.
The Headline Thesis
Inference cost fell 1000× in 3 years. Memory just became the bottleneck that decides whether that drop continues.
- AI inference query cost: $400 → $0.40 per million tokens for GPT-4-class output (early 2023 → March 2026)
- Simultaneously, 30% of Big Tech’s 2026 capex is going to memory, DRAM prices have tripled, and HBM capacity is sold out through 2026
Two forces, opposite directions. The 1000× chart sets the demand curve. The memory crunch sets the supply ceiling.
The Corpus
| Category | Videos | Total tokens |
|---|---|---|
| Pricing & Economics (long-form deep dives) | 5 | 1.56 M |
| Memory & HBM | 5 | 0.91 M |
| Expert Interviews (BG2 / Dwarkesh / 20VC / All-In) | 21 | 4.40 M |
| Acquired Deep Dives (NVIDIA I/II/III · TSMC · Morris Chang · Jensen) | 6 | 2.56 M |
| China Perspective (Bilibili) | 2 | 0.08 M |
| Total | 39 | 9.5 M |
Top Findings
1. Memory Crunch — the headline numbers
| Metric | Value | Source |
|---|---|---|
| Big Tech 2026 capex going to memory | 30% | Dylan Patel · mDG_Hx3BSUE @ 1:23:11 |
| DRAM price (commodity) | $3-4/GB → $12/GB | mDG_Hx3BSUE @ 1:24:23 |
| iPhone 12GB memory BOM | $50 → $150 | mDG_Hx3BSUE @ 1:24:12-26 |
| Implied iPhone consumer price impact | +$250 | mDG_Hx3BSUE @ 1:24:51 |
| Smartphone 2026 unit forecast | 1.1 B → 500-600 M (possible) | mDG_Hx3BSUE @ 1:25:31 |
| SK Hynix HBM market share | ~70% | BV1SbeAzfE37 |
2. HBM technical reality
| Spec | HBM4 stack | DDR5 channel | Ratio |
|---|---|---|---|
| Interface width | 2048 bits | 64-128 bits | ~20× |
| Bandwidth | ~2.5 TB/s | ~128 GB/s | ~20× |
| Source | mDG_Hx3BSUE @ 1:21:07-50 |
same |
3. The 1000× inference cost timeline
| Model | $/M input tokens | Notes |
|---|---|---|
| GPT-4 (early 2023) | $400 | as query |
| GPT-4-class (March 2026) | $0.40 | 1000× drop |
| Claude Opus 4.1 → 4.6 | $15 → $5 | 67% cut, one gen |
| Gemini 2.5 Pro | $1.25 | Google premium |
| DeepSeek V3 | $0.14 input / $0.28 output | 1/20 of GPT-4 launch |
| Claude Haiku | $0.25 | Approaching DB query cost |
Source: KvoD4nu6H08 @ 00:00-01
4. Bottleneck Hierarchy (Dylan Patel’s framework)
| Rank | Bottleneck | Why | Dylan’s Verdict |
|---|---|---|---|
| 1 | ASML EUV tools | ~70/yr → ~100/yr by 2030 = ~700 tools = ~200 GW chip capacity ceiling | “Ultimate” |
| 2 | HBM / memory bandwidth | Physical fab space short-term, wafer area long-term | “Supply constraint” |
| 3 | CoWoS advanced packaging | TSMC throughput | Joint with HBM |
| 4 | Data center construction labor | Solved by modularization | Solvable |
| 5 | Power | Grid has idle capacity, “behind-the-meter” gas turbines | Not the bottleneck |
5. China differentiation (from Bilibili)
Chinese-language sources surface alternatives that English-language discourse rarely covers:
- Huawei UCM (Unified Cache Manager): TTFT -90%, throughput +22× via tiered storage vs HBM-only
- Saimemory (SoftBank + Intel + UTokyo): new stacked DRAM, 40-50% power reduction, prototype 2027
- NEO Semiconductor X-HBM: 16× bandwidth, 10× density target, 512 Gbit per chip
- 3D X-DRAM: vertical stacking breaks 2D scaling limits
- Samsung Z-NAND revival: 15× perf, 80% power cut
Sourcing & Verification
- Items flagged for verification (do NOT publish without checking):
- “NVIDIA paid Grok $20 billion” → likely Groq (LPU company), not Grok (xAI). Transcription artifact.
- “Gemini Flashlight” → almost certainly Flash or Flash Lite, mis-transcribed.
Browse
- 🇬🇧 English:
videos/en/ - 🇨🇳 中文:
videos/zh/ - 📋 Full findings analysis
Each video page has: segments with YouTube-linked timestamps, specific prices table, memory facts, bottleneck claims, predictions, key technologies, companies mentioned, notable quotes, and takeaways.
Pipeline
- Extraction: Vertex AI Gemini 2.5 Pro / 3.1 Pro Preview, native YouTube
file_uriingestion - Bilibili: yt-dlp 480p → inline blob upload
- Long videos (>2 h): chunked via
start_offset/end_offset - Structured JSON schema enforced via
response_mime_type=application/json
License
CC BY 4.0 for indexed notes content. Original videos remain the property of their creators.