AI Infrastructure Pricing & Memory: Findings from 10 Videos

Synthesis from 10 long-form videos (2.46M total tokens of extraction). Every fact below carries a [video_id @ timestamp] citation. No fabricated numbers.


Headline Thesis

Inference cost fell 1000× in 3 years. Memory just became the bottleneck that decides whether that drop continues.

  • AI inference query cost: $400 → $0.40 per million tokens for GPT-4-class output (early 2023 → March 2026) — a 1000× collapse [KvoD4nu6H08 @00:00:17–24]
  • Simultaneously, in the same 24-month window, 30% of Big Tech’s 2026 capex is going to memory, DRAM prices have tripled, and HBM capacity is sold out through 2026 [mDG_Hx3BSUE @1:23:11]

Two forces, opposite directions. The 1000× chart sets the demand curve. The memory crunch sets the supply ceiling.


1. The Memory Crunch, Quantified

1.1 Headline numbers

Metric Value Source
Big Tech 2026 capex going to memory 30% Dylan Patel [mDG_Hx3BSUE @1:23:11]
DRAM price (commodity) $3-4/GB → $12/GB (3× increase) [mDG_Hx3BSUE @1:24:23]
iPhone 12GB memory BOM $50 → $150 (+$100) [mDG_Hx3BSUE @1:24:12–26]
Implied iPhone consumer price impact +$250 [mDG_Hx3BSUE @1:24:51]
Smartphone units 2026 forecast 1.1 B → 500–600 M (potential) [mDG_Hx3BSUE @1:25:31]
SK Hynix HBM market share ~70% [BV1SbeAzfE37] (Bilibili)
HBM share of B200 die cost “Single largest input, exceeding logic die + CoWoS interposer combined” TrendForce reporting (corroborates [mDG_Hx3BSUE])

1.2 What Dylan Patel actually said about wafer economics

“The metric that you actually care about is bandwidth per wafer, not bits per wafer.”
— Dylan Patel [mDG_Hx3BSUE @1:19:27]

This is the technical key. HBM uses 3-4× more wafer area per bit than commodity DRAM, but delivers >10× the bandwidth per wafer. As long as AI workloads are bandwidth-bound (which they are), HBM wins — and DRAM fabs get reallocated to HBM, starving the consumer market.

1.3 HBM vs DDR5 — the specs gap

Spec HBM4 stack DDR5 channel Ratio
Interface width 2048 bits 64-128 bits ~20×
Bandwidth ~2.5 TB/s ~128 GB/s ~20×
Source [mDG_Hx3BSUE @1:21:07-50] same  

For context: the H100 uses HBM3, has 80GB across 6 stacks; AMD MI300 has 128/192GB across 8 stacks at 5.6 TB/s [yAw63F1W_Us @00:11–40].

1.4 The “memory wall” reframed

“The bottleneck is no longer just compute. It is how we manage memory.”
— Anat Heifetz, VAST Data [ttmVDoRyP1k @00:00:12]

“Inference context itself is becoming the key bottleneck, and not the primary compute.”
— Dr. Vikram Sharma [ttmVDoRyP1k @00:06:23]

KV cache for 125k token context = 64 GB of HBM, 28 seconds of compute time (vs <1 second / <1 GB at 1k tokens). The cache is the new working set — and HBM is too expensive to hold all of it [ttmVDoRyP1k @00:05:44].

NVIDIA + VAST claim CMX (Context Memory Storage) tier delivers 20× faster TTFT, 90% better GPU utilization, 70% lower storage power, +60-130% tokens per dollar [ttmVDoRyP1k @00:07:35, @00:14:35].

For 10k concurrent users, 48 PB of tiered “agentic memory” storage is what NVIDIA sizes for [ttmVDoRyP1k @00:27:50].


2. The 1000× Inference Drop

“I believe this single chart, inference cost over time, is the most important chart in technology right now.”
— Marco [KvoD4nu6H08 @00:00:37]

2.1 Token price collapse (early 2023 → March 2026)

Model Input price ($/M tokens) Notes
GPT-4 (early 2023) $400 as a query, not per-token equivalent
GPT-4-class (March 2026) $0.40 1000× drop
Claude Opus 4.1 → 4.6 $15 → $5 67% cut, same generation
Gemini 2.5 Pro $1.25 Google’s premium
DeepSeek V3 $0.14 input / $0.28 output 1/20 of GPT-4 launch price
Claude Haiku $0.25 Approaching database query costs

Source: [KvoD4nu6H08 @00:00–01]

2.2 The contrarian read

“The contrarian position is that the inference cost collapse is actually terrible for most AI companies, and potentially for the AI industry as a whole.”
— Marco [KvoD4nu6H08 @00:09:00]

“If the inference cost collapse leads to commoditization of the inference layer, the value in the AI stack shifts. It moves away from the model layer and away from the chip layer and toward the application layer.”
— Marco [KvoD4nu6H08 @00:12:00]

Implication: Application-layer companies (Notion AI, Cursor, ChatGPT) benefit; model providers and chip vendors face margin compression unless they capture switching costs.

2.3 But total spend is exploding anyway

Item Number Source
Big 4 (Amazon, Meta, Google, MSFT) 2026 capex $600 B combined [mDG_Hx3BSUE @00:00:28]
Total semi supply chain 2026 spend ~$1 trillion [mDG_Hx3BSUE @00:01:50]
OpenAI–Oracle compute purchase $300 B [NbL7yZCF-6Q @00:00:06]
NVIDIA’s pledged investment in OpenAI up to $100 B [NbL7yZCF-6Q @00:02:18]
OpenAI–Broadcom custom chip deal $10 B [NbL7yZCF-6Q @00:02:02]
OpenAI–AMD chip purchase “tens of billions” [NbL7yZCF-6Q @00:02:31]
OpenAI–CoreWeave contract up to $22.4 B [arU9Lvu5Kc0 @00:12:18]
McKinsey: AI capex by 2030 $5.2 T [NbL7yZCF-6Q @00:07:08]
Bain: annual revenue needed to justify capex $2 T/year [NbL7yZCF-6Q @00:07:17]
Anthropic ARR (initial → current) $9 B → $35–40 B [LF3aUIM57uw @00:10:20]
1 GW of latest NVIDIA GPUs ~$35 B [arU9Lvu5Kc0 @00:09:52]
H100 rental (2-3 yr deal) $2.40/hr [mDG_Hx3BSUE @00:07:45]
H100 5-year TCO $1.40/hr [mDG_Hx3BSUE @00:08:48]

The price-per-token fell 1000×; total spend went up. Jevons paradox in production.


3. Bottleneck Hierarchy (Dylan Patel’s framework)

Dylan Patel’s 2.5-hour interview defines a stack of bottlenecks, ordered from “ultimate” to “solvable”:

Rank Bottleneck Why Dylan’s verdict
1 ASML EUV tools Sole supplier, ~70/yr → ~100/yr by 2030 = ~700 tools = ~200 GW total AI chip capacity ceiling “Ultimate bottleneck” [mDG_Hx3BSUE @00:34:51]
2 HBM / Memory bandwidth Physical fab space short-term (memory vendors didn’t build in 2023), HBM wafer area limit long-term “Memory is a supply constraint” [oE5lNDhz9oo @08:02]
3 CoWoS advanced packaging TSMC’s CoWoS-S/L throughput Aligned with HBM as joint constraint [oE5lNDhz9oo @08:44]
4 Data center construction labor Electricians, modular construction will solve “Major constraint, but solvable” [mDG_Hx3BSUE @1:51:37]
5 Power Grid has idle capacity; “behind-the-meter” gas turbines; CapEx ~$1500/kW Not the bottleneck [mDG_Hx3BSUE @1:42:36]

“Chips are the biggest bottleneck, and so you want them deployed working on AI the moment they’re done being manufactured.”
— Dylan Patel [mDG_Hx3BSUE @2:03:07]

Important counter-claim from the dataset: Memory advocates (yAw63F1W_Us, RqIZzaTE-wY, ttmVDoRyP1k) argue memory has already overtaken EUV as the #1 constraint:

  • “Nvidia’s next bottleneck might not be GPUs anymore, it may be memory” [RqIZzaTE-wY @00:05]
  • “The bottleneck is no longer just compute. It is how we manage memory.” [ttmVDoRyP1k]

Both can be true: EUV bounds the long-run ceiling (10-yr); memory bounds 2026-2028 quarterly throughput.


4. Supplier Dynamics

4.1 HBM is a 3-player oligopoly tilting to 1

Player HBM share Strategic position
SK Hynix ~70%, first to HBM3 Sole HBM4 supplier to NVIDIA “for quite a long time” — Jensen at CES 2026
Samsung <30%, behind on HBM4 yield Reviving Z-NAND (15× perf, 80% power cut vs NAND) as HBM alternative [BV1SbeAzfE37]
Micron Smaller share, gaining “Sold out through calendar year 2026”

4.2 China response (mostly unreported in English videos, captured by Bilibili)

Player Move Source
Huawei UCM (Unified Cache Manager) Tiered storage management, TTFT -90%, throughput +22× vs HBM-only [BV1SbeAzfE37]
Saimemory (SoftBank+Intel+UTokyo) New stacked DRAM, 40-50% power reduction, prototype 2027, mass production <2030 [BV1SbeAzfE37]
NEO Semiconductor X-HBM 16× bandwidth, 10× density, 512 Gbit per chip target [BV1SbeAzfE37]
3D X-DRAM Vertical stacking breaks 2D scaling limits [BV1SbeAzfE37]

This is the differentiated angle: English-language AI infra discourse treats HBM as a fait accompli SK-Hynix duopoly. Chinese-language sources show an active alternatives race.

4.3 Customer concentration

NVIDIA buys HBM aligned with CoWoS + Grace Blackwell + CPU planning [oE5lNDhz9oo @08:44]. As AI takes over leading-edge wafer demand, Apple becomes a smaller TSMC customer [mDG_Hx3BSUE @2:21:41].


5. The Bubble / Circular Financing Question

Concern Number Source
Projected AI data center spending, next 5 years ~$7 T [NbL7yZCF-6Q @11:42]
OpenAI 2025 cash burn $8.5 B (on $13 B revenue) [NbL7yZCF-6Q @00:07:26]
OpenAI Q3 2025 alone losses $15 B [arU9Lvu5Kc0 @00:12:57]
Anthropic ARR >$5 B run-rate (Aug 2025) [NbL7yZCF-6Q @00:07:32]
CoreWeave 9M 2025 revenue $3.6 B [arU9Lvu5Kc0 @00:07:16]
CoreWeave debt $14 B [arU9Lvu5Kc0 @00:09:02]
CoreWeave revenue backlog $55.6 B (driven by OpenAI + MSFT) [arU9Lvu5Kc0 @00:09:23]
NVIDIA backstop deal with CoreWeave $6.3 B [arU9Lvu5Kc0 @00:17:15]

The structure: NVIDIA invests in OpenAI → OpenAI buys NVIDIA chips → OpenAI rents from CoreWeave → CoreWeave buys NVIDIA chips with NVIDIA backstop. Whether this is “circular financing inflating a bubble” or vertical capital efficiency is the central debate in [NbL7yZCF-6Q] and [arU9Lvu5Kc0].


6. Predictions Captured (2026-2035)

When What Source
Near-term (1-2 yr) Memory crunch continues, prices keep rising, consumer electronics squeezed [mDG_Hx3BSUE @1:08:10]
Unspecified DRAM prices may double or triple again from current levels [LF3aUIM57uw]
2023-2028 HBM market: $2 B → $6.3 B [yAw63F1W_Us]
2024-2030 Global HBM market: >$25 B → >$100 B [RqIZzaTE-wY]
2027-2030 AI servers will use 2-4× more memory per system [RqIZzaTE-wY]
2028 TSMC capex single year: $100 B [LF3aUIM57uw @00:16:18]
2028 Data centers = 10% of US power grid [mDG_Hx3BSUE @1:47:16]
2030 AI data centers consume 200 GW critical IT power [mDG_Hx3BSUE @1:43:45]
2030 ~half new DC capacity = “behind-the-meter” (own generation) [mDG_Hx3BSUE @1:49:51]
2030 EUV tool fleet: ~700 tools~200 GW AI chip capacity ceiling [mDG_Hx3BSUE @11:37]
2030 McKinsey: AI industry needs $5.2 T cumulative capex [NbL7yZCF-6Q @00:07:08]
Through 2035 Asia Pacific inference market CAGR 24.7% [KvoD4nu6H08]
2035+ Space data centers might make sense once chips no longer the bottleneck [mDG_Hx3BSUE @2:03:44]

7. Sourcing & Verification Notes

7.1 The 10 videos

| ID | Title | Duration | Tokens | Angle | |—|—|—|—|—| | mDG_Hx3BSUE | Dylan Patel: Single biggest bottleneck to scaling AI compute | 2:30 | 724K (3 chunks) | Pricing supply | | LF3aUIM57uw | Dylan Patel: Supply & Demand of AI Tokens | 46 min | 221K | Economics | | NbL7yZCF-6Q | Is AI Circular Financing Inflating a Bubble? | 25 min | 243K | Bubble | | arU9Lvu5Kc0 | The Coming AI Datacenter Collapse | 21 min | 203K | Counter | | KvoD4nu6H08 | AI Inference Cost Fell 1000× in 3 Years | 17 min | 163K | Price trend | | yAw63F1W_Us | The Special Memory Powering the AI Revolution | 13 min | 127K | Memory primer | | oE5lNDhz9oo | Nvidia Huang, Michael Dell on Memory Demand & China | 20 min | 194K | Executive memory | | RqIZzaTE-wY | AI Has A Memory Bottleneck — Companies That Benefit | 18 min | 177K | Memory investor | | ttmVDoRyP1k | Breaking Through GPU Memory Wall (NVIDIA + VAST Data) | 46 min | 217K | Memory deep | | uldHpwtmFLY | SK Hynix CEO Keynote: Memory’s Journey towards Future ICT | 39 min | 190K | Memory supplier |

7.2 Items flagged for verification (DO NOT publish without checking)

  • “NVIDIA paid Grok $20 billion” [KvoD4nu6H08 @00:02:03] → almost certainly Groq (LPU company), not Grok (xAI). “License” wording also suspicious. Skip from public writeup until corroborated.
  • “Gemini Flashlight” [KvoD4nu6H08 @00:01:42] → almost certainly Gemini Flash or Flash Lite, transcription error.
  • “OpenAI’s full-year revenue and cash-burn target = $13B revenue, $8.5B cash-burn” [NbL7yZCF-6Q @00:07:26] → confirm against The Information’s reporting before quoting.

7.3 Items that surprised me (worth highlighting)

  • Dylan: “H100 is worth more today than it was three years ago” [mDG_Hx3BSUE @09:50] — counter to standard depreciation thinking
  • Dylan: 30% of Big Tech capex going to memory (2026) — bigger than I expected
  • Bilibili: Huawei UCM claims 90% TTFT reduction — same target as NVIDIA+VAST CMX but built independently
  • VAST claims 60-130% more tokens per dollar from CMX — if accurate, this changes inference unit economics

7.4 Companies mentioned (top 20, dedup case-insensitive)

NVIDIA (11), OpenAI (9), Anthropic (7), TSMC (7), Samsung (7), AMD (6), Google (5), Intel (5), Microsoft (4), CoreWeave (4), Oracle (4), ASML (4), SK Hynix (4), Micron (4), Meta (3), Amazon (3), Broadcom (2), Apple (2), Cymer (2), VAST Data, Cerebras, Huawei


8. Open Questions (next batch should target)

  1. HBM4 yield specifics — SK Hynix’s 16-layer 48GB HBM4 at CES 2026 — what’s the yield, what’s the unit price?
  2. Memory pricing forward curve — How much of the 3× DRAM increase is sticky vs cyclical?
  3. Hyperscaler internal price-per-token — What does it actually cost Google/Anthropic to serve a token on their own infra vs charging $1.25/M for it?
  4. China indigenous DRAM/HBM roadmap — Is YMTC or CXMT close to HBM3-equivalent, or still 2-3 nodes behind?
  5. Inference vs training capex split — Inference market 2026 = $50B+; training is multi-hundred-billion. Is the ratio shifting?