AI Infrastructure Pricing & Memory Digest

🇨🇳 中文版

Bilingual structured index of 39 long-form videos on AI infrastructure pricing, memory (HBM/DRAM), bottlenecks, and inference economics. Built from Vertex Gemini extraction; every claim cites a specific [video_id @ timestamp]. No fabricated numbers.

The Headline Thesis

Inference cost fell 1000× in 3 years. Memory just became the bottleneck that decides whether that drop continues.

AI inference query cost: $400 → $0.40 per million tokens for GPT-4-class output (early 2023 → March 2026)
Simultaneously, 30% of Big Tech’s 2026 capex is going to memory, DRAM prices have tripled, and HBM capacity is sold out through 2026

Two forces, opposite directions. The 1000× chart sets the demand curve. The memory crunch sets the supply ceiling.

The Corpus

Category	Videos	Total tokens
Pricing & Economics (long-form deep dives)	5	1.56 M
Memory & HBM	5	0.91 M
Expert Interviews (BG2 / Dwarkesh / 20VC / All-In)	21	4.40 M
Acquired Deep Dives (NVIDIA I/II/III · TSMC · Morris Chang · Jensen)	6	2.56 M
China Perspective (Bilibili)	2	0.08 M
Total	39	9.5 M

Top Findings

1. Memory Crunch — the headline numbers

Metric	Value	Source
Big Tech 2026 capex going to memory	30%	Dylan Patel · `mDG_Hx3BSUE @ 1:23:11`
DRAM price (commodity)	$3-4/GB → $12/GB	`mDG_Hx3BSUE @ 1:24:23`
iPhone 12GB memory BOM	$50 → $150	`mDG_Hx3BSUE @ 1:24:12-26`
Implied iPhone consumer price impact	+$250	`mDG_Hx3BSUE @ 1:24:51`
Smartphone 2026 unit forecast	1.1 B → 500-600 M (possible)	`mDG_Hx3BSUE @ 1:25:31`
SK Hynix HBM market share	~70%	`BV1SbeAzfE37`

2. HBM technical reality

Spec	HBM4 stack	DDR5 channel	Ratio
Interface width	2048 bits	64-128 bits	~20×
Bandwidth	~2.5 TB/s	~128 GB/s	~20×
Source	`mDG_Hx3BSUE @ 1:21:07-50`	same

3. The 1000× inference cost timeline

Model	$/M input tokens	Notes
GPT-4 (early 2023)	$400	as query
GPT-4-class (March 2026)	$0.40	1000× drop
Claude Opus 4.1 → 4.6	$15 → $5	67% cut, one gen
Gemini 2.5 Pro	$1.25	Google premium
DeepSeek V3	$0.14 input / $0.28 output	1/20 of GPT-4 launch
Claude Haiku	$0.25	Approaching DB query cost

Source: KvoD4nu6H08 @ 00:00-01

4. Bottleneck Hierarchy (Dylan Patel’s framework)

Rank	Bottleneck	Why	Dylan’s Verdict
1	ASML EUV tools	~70/yr → ~100/yr by 2030 = ~700 tools = ~200 GW chip capacity ceiling	“Ultimate”
2	HBM / memory bandwidth	Physical fab space short-term, wafer area long-term	“Supply constraint”
3	CoWoS advanced packaging	TSMC throughput	Joint with HBM
4	Data center construction labor	Solved by modularization	Solvable
5	Power	Grid has idle capacity, “behind-the-meter” gas turbines	Not the bottleneck

5. China differentiation (from Bilibili)

Chinese-language sources surface alternatives that English-language discourse rarely covers:

Huawei UCM (Unified Cache Manager): TTFT -90%, throughput +22× via tiered storage vs HBM-only
Saimemory (SoftBank + Intel + UTokyo): new stacked DRAM, 40-50% power reduction, prototype 2027
NEO Semiconductor X-HBM: 16× bandwidth, 10× density target, 512 Gbit per chip
3D X-DRAM: vertical stacking breaks 2D scaling limits
Samsung Z-NAND revival: 15× perf, 80% power cut

Sourcing & Verification

Items flagged for verification (do NOT publish without checking):
- “NVIDIA paid Grok $20 billion” → likely Groq (LPU company), not Grok (xAI). Transcription artifact.
- “Gemini Flashlight” → almost certainly Flash or Flash Lite, mis-transcribed.

Browse

🇬🇧 English: videos/en/
🇨🇳 中文: videos/zh/
📋 Full findings analysis

Each video page has: segments with YouTube-linked timestamps, specific prices table, memory facts, bottleneck claims, predictions, key technologies, companies mentioned, notable quotes, and takeaways.

Pipeline

Extraction: Vertex AI Gemini 2.5 Pro / 3.1 Pro Preview, native YouTube file_uri ingestion
Bilibili: yt-dlp 480p → inline blob upload
Long videos (>2 h): chunked via start_offset / end_offset
Structured JSON schema enforced via response_mime_type=application/json

License

CC BY 4.0 for indexed notes content. Original videos remain the property of their creators.