Dylan Patel: Single biggest bottleneck to scaling AI compute

Category: Pricing & Economics · Duration: 150 min · ▶ Watch

Speakers: Dwarkesh Patel · Dylan Patel · Dylan Patel, CEO of SemiAnalysis · Man in light green jacket (Dylan Patel of SemiAnalysis) · Man in white sweater

Switch language → 中文

Segments (42)

00:00:00 · Introduction and Banter
- The host and guest exchange jokes before starting the main discussion on semiconductors.
00:00:18 · The Scale of AI Compute Spending
- The host questions the massive capex of Big Tech and fundraising of AI labs in relation to the cost of compute.
00:01:41 · Understanding Hyperscaler Capex
- The guest explains that the enormous capex figures include long-term setup costs for data centers and power, not just immediate compute hardware.
00:02:55 · Anthropic’s Compute Needs and Scaling Challenges
- The guest details how Anthropic’s rapid revenue growth necessitates a massive, multi-gigawatt expansion of their compute capacity this year.
00:04:03 · Acquiring Compute in a Pinch
- The host asks how a company like Anthropic can acquire needed compute capacity if they didn’t plan for it, leading to a discussion of their conservative strategy.
00:04:48 · The Consequences of Conservative vs. Aggressive Compute Strategies
- The guest contrasts Anthropic’s conservative approach with OpenAI’s aggressive, early deal-making, which has given OpenAI more access to compute.
00:06:18 · The Economics of GPU Depreciation
- The host questions the traditional GPU depreciation cycle, suggesting the immense demand for AI could make older GPUs valuable for longer.
00:08:52 · Two Lenses on GPU Value: TCO vs. Utility
- The guest explains that GPU value can be seen through a cost-focused TCO model or a utility-focused model, where the value is tied to the increasingly powerful models it can run.
00:12:51 · The Ultimate Bottleneck: The Semiconductor Supply Chain
- The host asks about the 5-year bottleneck, and the guest identifies the core semiconductor supply chain, especially EUV tools from ASML, as the ultimate constraint.
00:14:51 · Why Can’t ASML Just Build More?
- The guest explains the extreme complexity and long lead times of the various components of an EUV machine, which prevents rapid expansion of production.
00:18:00 · Dario’s Dilemma: Inconsistent Statements on Compute
- The host points out the inconsistency between Dario Amodei’s optimistic AGI timelines and Anthropic’s conservative compute acquisition strategy.
00:18:52 · The Value of Committing to Compute Early
- The guest explains that as models become more powerful, the value of a GPU increases, rewarding those who commit to long-term compute contracts early with better margins.
00:20:34 · The Alchian-Allen Effect in AI
- The host applies an economic principle to suggest that as the fixed cost of compute rises, users will be more willing to pay a premium for the highest quality models.
00:23:13 · The Race for Incremental Compute
- The guest discusses how the battle for incremental compute capacity is where the costs are highest, and how different players are positioned in this race.
00:25:52 · TSMC’s Allocation Strategy
- The guest explains why TSMC might favor allocating capacity to more stable markets like CPUs over the volatile but high-growth AI chip market.
00:31:23 · The Google-Anthropic TPU Deal
- The guest provides context on how Anthropic secured a large amount of Google’s TPU capacity due to an information asymmetry before Google fully realized its own internal demand.
00:34:34 · The Shifting Bottlenecks of AI Compute
- The host and guest discuss how the primary bottleneck for scaling AI compute is shifting from power and data centers back to the core semiconductor supply chain.
00:37:03 · Projecting Future Compute Capacity
- The guest breaks down the math of how the limited production of EUV tools by ASML creates an absolute ceiling on the amount of AI compute that can be deployed by 2030.
00:43:20 · The Decade-Long Lifespan of EUV Tools
- The host expresses surprise that the same core manufacturing machines will be in use for a decade, and the guest explains the continuous upgrades and complexities involved.
50:00 · The Marvel of Lithography Machines
- The speaker details the incredible complexity and precision of ASML’s lithography tools, which involve components moving at 9Gs with nanometer-level accuracy.
51:17 · The Semiconductor Supply Chain
- The discussion covers the vast and intricate supply chain for semiconductor manufacturing, involving thousands of specialized suppliers like Zeiss and Cymer.
52:23 · The Bullwhip Effect in AI Compute
- The speaker explains how the demand for AI compute is not being met due to a bullwhip effect down the supply chain, where each layer is under-building relative to the end demand.
54:40 · Sponsor: Labelbox and EchoChain
- The host presents a sponsor segment for Labelbox, highlighting their EchoChain pipeline for diagnosing and fixing voice model failures caused by interruptions.
55:43 · Can Older Tech Solve the Bottleneck?
- The host questions if older 7nm process nodes using DUV could be repurposed to alleviate the EUV bottleneck, similar to how grid constraints are bypassed.
57:55 · Why Older Tech Isn’t a Simple Fix
- The guest argues that simply using older nodes isn’t a solution because modern chips are co-designed for specific numerics and system-level performance, making direct comparisons misleading.
1:00:41 · The Future of Chip Packaging
- The conversation shifts to advanced packaging, like multi-die chips (B200, Rubin) and wafer-scale integration (Tesla’s Dojo), as ways to increase on-package performance.
1:02:50 · China’s Semiconductor Ambitions
- The speakers discuss the timeline for China to develop an indigenous semiconductor supply chain and whether their scale could eventually overcome the West’s technological lead.
1:08:07 · The Memory Crunch and Consumer Impact
- The guest predicts a severe memory crunch will increase the cost of consumer electronics like smartphones and PCs, potentially leading to public backlash against AI.
1:16:01 · HBM vs. DDR: The Bandwidth Bottleneck
- A technical explanation of why commodity DDR memory cannot replace HBM for high-performance AI, due to the order-of-magnitude difference in memory bandwidth per chip edge area.
1:28:45 · The Fab Bottleneck
- The primary constraint on increasing memory production is the lack of physical fab space, as new fabs take years to build and retooling existing ones is complex.
1:33:26 · Elon Musk’s Terafab and Disruptive Tech
- The speakers speculate on whether Elon Musk could rapidly build a massive ‘Terafab’ and discuss the low probability of a simple, disruptive technology replacing the current complex lithography process.
1:40:00 · The Real Bottleneck: Toolmakers, Not Foundries
- The guest argues that the ultimate constraint on AI compute growth is not foundries like TSMC, but the toolmakers like ASML and material suppliers that enable them.
1:40:30 · Energy Turbine Analogy and Arbitrage
- An analogy is drawn between buying up turbine capacity for energy and the potential for arbitraging the EUV tool supply chain, though it’s deemed unlikely ASML would allow it.
1:42:34 · Power Generation: A Solvable Problem
- The guest claims that power generation, while a challenge, is not the primary bottleneck for AI, citing diverse energy sources and idle grid capacity.
1:50:50 · Labor Constraints and Modularization
- Labor for data center construction is identified as a significant constraint, which will be addressed through increased modularization and factory-based integration.
1:54:42 · The Case Against Space Data Centers
- The idea of space-based data centers is critiqued, not for power reasons, but due to the extreme logistical challenges, deployment delays, and unreliability in a chip-constrained world.
2:00:20 · Scale-Up Domains: Nvidia vs. Google vs. Amazon
- A comparison of the network topologies and scale-up domain sizes for Nvidia, Google (TPU), and Amazon (Trainium), highlighting the different architectural trade-offs.
2:08:58 · Why AI Model Parameter Scaling Slowed
- The guest theorizes that the slow-down in model parameter scaling was due to the feedback loop where smaller, faster-to-iterate models provide more value in a compute-constrained environment.
2:14:06 · The Role of Leopold Aschenbrenner and SemiAnalysis
- The host and guest discuss how some investors, like Leopold, use deep supply chain analysis to make successful bets, and that SemiAnalysis’s data is often seen as aggressive but proves correct.
2:18:30 · TSMC, Apple, and the Future of Node Capacity
- The guest predicts that AI companies will increasingly dominate TSMC’s leading-edge capacity, making Apple a less relevant customer over time.
2:22:00 · Huawei’s Potential
- If not for sanctions, Huawei’s vertical integration and talent pool could have made them a leader in AI accelerators, potentially surpassing Nvidia.
2:24:32 · The Future of Robotics: Centralized vs. On-Device Intelligence
- The discussion posits that for efficiency and capability, future robots will offload most of their ‘thinking’ to powerful, centralized cloud models rather than relying solely on on-device compute.

Specific Prices (18)

Timestamp	Item	Value	Context
00:00:28	Big 4 (Amazon, Meta, Google, Microsoft) combined capex forecast for the year	$600 billion	The host uses this number to frame the immense scale of current AI infrastructure spending.
00:00:55	OpenAI fundraising target	$110 billion	Mentioned as part of the massive capital being raised by AI labs.
00:00:57	Anthropic fundraising target	$30 billion	Mentioned as part of the massive capital being raised by AI labs.
00:01:15	Yearly rental price for one gigawatt of compute	$10-13 billion	Used to estimate the compute capacity implied by fundraising and capex figures.
00:01:50	Total semiconductor supply chain spending	On the order of $1 trillion	The guest expands the scope of spending beyond just the hyperscalers.
00:02:19	Google’s capex	$180 billion	Used as an example of how capex is spent on future infrastructure like turbine deposits for 2028-29.
00:03:39	Anthropic’s implied compute spend for inference	$40 billion	Calculated based on their projected revenue growth and gross margins.
00:07:45	H100 rental price for a 2-3 year deal	$2.40 per hour	An example of high prices labs are willing to pay to secure compute.
00:08:48	Cost to deploy an H100 over 5 years	$1.40 per hour	The baseline cost of ownership for a cloud provider.
00:12:22	Selling price of an ASML EUV tool	$300-400 million	Highlights the cost of the most critical piece of manufacturing equipment.
1:23:11	Big Tech Capex on Memory (2026)	30%	The guest states that 30% of the capex of big tech companies in 2026 will be going towards memory.
1:24:12	iPhone Memory Cost (Past)	$50	The cost of 12GB of memory in an iPhone used to be roughly $50 ($3-4 per gigabyte).
1:24:23	iPhone Memory Cost (Present)	$150	With DRAM prices tripling to $12 per gigabyte, the cost of 12GB of memory is now around $150.
1:24:26	iPhone BOM Increase	$100	The increase in memory cost results in a $100 increase in the iPhone’s bill of materials (BOM).
1:24:51	iPhone Consumer Price Increase	$250	A $150 BOM increase for memory could translate to a $250 increase in the final price for the consumer.
1:48:15	Combined Cycle Gas Turbine CapEx	$1500 per kilowatt	The capital expenditure for building a combined cycle gas turbine power plant.
1:48:27	Alternative Power Generation CapEx	Up to $3500 per kilowatt	The guest states that even if alternative power sources are twice as expensive as combined cycle, the impact on total GPU cost is minimal.
1:48:41	Nvidia Hopper GPU Cost	$1.40 per hour	The approximate total cost of ownership (TCO) for a Hopper GPU, used to illustrate how even doubling power prices only adds about $0.10 to this cost.

Memory Facts (10)

[00:16:21] An H100 GPU has 80 gigabytes of memory.
- 80 GB
[00:16:23] The human brain is estimated to have petabytes of memory.
- Petabytes
[1:16:10] HBM has 3-4x less bits per wafer area than the DRAM it’s made from.
- 3-4x
[1:21:07] An HBM4 stack has a 2048-bit wide interface.
- 2048 bits
[1:21:50] A DDR5 channel has a 64 or 128-bit wide interface.
- 64-128 bits
[1:21:41] An HBM4 stack provides roughly 2.5 Terabytes per second of bandwidth.
- 2.5 TB/s
[1:22:16] A DDR5 channel provides roughly 128 Gigabytes per second of bandwidth.
- 128 GB/s
[2:09:28] Nvidia’s scale-up domains have historically had limited memory capacity, which constrained the size of models that could be efficiently trained.
[2:10:00] The Nvidia NVL72 scale-up system has a total memory capacity of 20 terabytes.
- 20 TB
[2:10:10] Google’s TPU pods have memory capacities in the hundreds of terabytes.
- Hundreds of TB

Bottleneck Claims (10)

[00:34:35] The bottleneck for scaling AI compute is shifting from power and data centers to the semiconductor supply chain.
- Evidence: The guest argues that while power and data centers were previous bottlenecks, the longest lead-time items are now the fabs and the tools to equip them, making them the new constraint.
[00:34:51] The ultimate bottleneck for AI compute is the production of EUV lithography machines by ASML.
- Evidence: ASML is the sole producer of these critical machines. Their production capacity is limited (e.g., ~70 this year, rising to ~100 by 2030), which places a hard cap on the total amount of advanced logic wafers that can be produced globally.
[00:46:15] ASML cannot easily or quickly expand its production capacity for EUV tools.
- Evidence: The guest explains that ASML’s own supply chain for components like the light source (Cymer), optics (Zeiss), and stages is extremely complex and has very long lead times, preventing a rapid ‘yolo’ expansion.
[1:28:45] The primary bottleneck for increasing memory production is the lack of physical fab space.
- Evidence: Memory vendors were not profitable in 2023 and thus did not build new fabs, which take 2+ years to come online. There is nowhere to put new tools.
[1:16:30] Memory bandwidth, not capacity, is the key constraint for AI performance, making commodity DRAM a poor substitute for HBM.
- Evidence: HBM offers over an order of magnitude more bandwidth (TB/s vs GB/s) for the same chip edge area, which is critical for feeding the compute units (FLOPS) with weights and KV cache data.
[1:33:46] Developing the process technology itself is a much harder bottleneck to solve than building the physical fab.
- Evidence: Process technology relies on immense, cumulative, built-up knowledge and integrating a highly complex supply chain, which is why only a few companies like TSMC, Intel, and Samsung can do it at the leading edge.
[1:40:12] The ultimate bottleneck for AI compute is not foundries, but the toolmakers (like ASML) and material suppliers.
- Evidence: Even with more foundries from Intel and Samsung, the entire industry relies on a very small number of companies for critical manufacturing equipment, which cannot be scaled up quickly.
[1:42:36] Power generation is not the ultimate bottleneck for AI compute.
- Evidence: The guest argues there are numerous ways to generate power, and the existing grid has significant idle capacity. The supply chains for power generation are simpler and more diverse than for semiconductors.
[1:51:37] Labor for data center construction is a huge constraint.
- Evidence: The sheer number of electricians and construction workers needed is massive, but this will be mitigated by moving to more modular, factory-built data center components.
[2:03:07] Chips are the biggest bottleneck for AI progress.
- Evidence: The entire discussion centers on the idea that the ability to manufacture leading-edge chips is the most constrained resource, and all other decisions (like where to place data centers) are secondary to maximizing the utility of these scarce chips.

Predictions (13)

[00:03:58, End of 2024] Anthropic will need to have well above 5 gigawatts of compute by the end of the year.
[00:11:37, 2030] The global ecosystem will have around 700 EUV tools.
[00:11:40, 2030] The total available EUV capacity will support the production of about 200 gigawatts worth of AI chips.
[00:12:27, End of Decade] ASML will produce 80 EUV tools next year and grow to a little over 100 per year.
[1:08:10, Near-term (next 1-2 years)] The memory crunch will continue, and prices will keep rising, negatively impacting the consumer electronics market.
[1:25:31, Next year] Smartphone sales volumes could drop from 1.1 billion to as low as 500-600 million units per year.
[1:02:50, 2030] China will likely have a fully indigenous DUV supply chain by 2030, and working EUV tools, but not at mass production scale.
[1:11:50, 2030-2035] If AI timelines are fast, the US/West will win due to its current lead. If timelines are long (e.g., 2035), China could win due to its ability to scale a vertical supply chain.
[1:43:45, End of decade (c. 2030)] By the end of the decade, AI data centers will consume 200 gigawatts of critical IT power.
[1:47:16, 2028] By 2028, data centers will consume 10% of the US power grid.
[1:49:51, End of decade (c. 2030)] By the end of the decade, about half of new data center capacity will be built ‘behind the meter’ (with its own dedicated power generation).
[2:03:44, 2035+] Space data centers might make sense, but not until 2035 or later, once chips are no longer the primary bottleneck.
[2:21:41, Next few years] Apple will become a progressively smaller and less relevant customer for TSMC’s leading-edge nodes as AI demand grows.

Key Technologies (17)

Semiconductors: The fundamental electronic components (chips) that perform computation.
GPUs (H100, Blackwell, Rubin): Specialized processors from Nvidia that are the primary hardware for training and running large AI models.
EUV (Extreme Ultraviolet) Lithography: A highly advanced manufacturing process using EUV light to pattern the smallest features on silicon wafers, enabling the creation of cutting-edge chips.
TPUs (Tensor Processing Units): Google’s custom-designed AI accelerator chips, an alternative to GPUs.
DRAM (Dynamic Random-Access Memory): A type of memory essential for AI chips to store model weights and intermediate calculations during processing.
CoWoS (Chip-on-Wafer-on-Substrate): An advanced packaging technology from TSMC used to integrate multiple chips, like GPUs and HBM memory, into a single powerful package.
EUV Lithography: Extreme Ultraviolet Lithography is a highly complex process using 13.5nm light to pattern the smallest features on semiconductor wafers. It is a key bottleneck for leading-edge chip manufacturing.
HBM (High Bandwidth Memory): A type of stacked DRAM that provides extremely high memory bandwidth, essential for modern AI accelerators. It is expensive and takes up more wafer area per bit than standard DRAM.
DDR (Double Data Rate) SDRAM: Commodity memory used in PCs and servers. It offers much higher capacity per wafer but significantly lower bandwidth compared to HBM.
Advanced Packaging (CoWoS, Wafer-Scale): Techniques for integrating multiple chiplets (dies) into a single package to increase performance by enabling very high-speed communication between them, bypassing slower off-package interconnects.
EUV (Extreme Ultraviolet) Lithography: A critical technology for manufacturing the most advanced semiconductor chips, with ASML being the sole supplier of the machines.
Gas Turbines (Combined Cycle, Industrial, Aeroderivative): Engines used to generate large amounts of electricity for power grids and, increasingly, dedicated data centers.
Reciprocating Engines: An alternative to turbines for power generation, often used in ships and trucks, that can be repurposed for data centers.
Fuel Cells: An electrochemical cell that converts chemical energy into electricity, mentioned as a power source for data centers (e.g., Bloom Energy).
Scale-Up Domain: A cluster of chips (GPUs/TPUs) connected by a high-bandwidth, low-latency network (e.g., NVLink, ICI) that allows them to function as a single, powerful computer for training large AI models.
Torus Network Topology: A network architecture used by Google’s TPUs where each chip connects directly to a small number of neighbors (e.g., 6), and communication to distant chips must hop through intermediaries.
All-to-All Network Topology: A network architecture, like in Nvidia’s NVL72, where every chip in the scale-up domain can communicate directly with every other chip at maximum bandwidth.

Companies Mentioned (45)

RØDE · SemiAnalysis · Amazon · Meta · Google · Microsoft · OpenAI · Anthropic · CoreWeave · Oracle · SoftBank Energy · Nscale · TSMC · Nvidia · ASML · SK Hynix · Samsung · Broadcom · Apple · AMD · Cymer · Carl Zeiss · Zeiss · Labelbox · Kuaishou (Kimi) · 01.AI (Deepseek) · Tesla · Intel · Xiaomi · Oppo · Micron · Hynix · Applied Materials · Lam Research · Siemens · Mitsubishi · GE Vernova · Boom Supersonic · Crusoe · Cummins · Nebius · Bloom Energy · SpaceX · Starlink · Huawei

Notable Quotes (15)

No sloppy seconds for Dwarkesh. — Host @ 00:00:11

I’m not going to go crazy on compute, because if my revenue inflects at a different rate, at a different point, I don’t want to go bankrupt. — Guest (paraphrasing Dario Amodei) @ 00:04:25

Let’s just sign these crazy fucking deals, right? — Guest (characterizing OpenAI’s strategy) @ 00:04:40

An H100 is worth more today than it was three years ago. — Host @ 00:09:50

Name me a petabyte of ones and zeros, bro. — Guest @ 00:16:26

There is a bit of a meme that they are… they don’t… they have problems with commitment issues and they’re like sort of polyamorous. — Guest @ 00:19:27

If anything is messed up, the yield goes to zero, right? Because this is such a finely tuned system. — Dylan Patel @ 50:55

You go down the supply chain, everyone’s doing minus one, and in some cases they’re doing like divided by two, right? Because they just don’t, they’re not AGI pilled. — Dylan Patel @ 53:31

The metric that you actually care about is bandwidth per wafer, not bits per wafer. — Dylan Patel @ 1:19:27

Today you already see all the memes like on PC subreddits and PC like Twitter, gaming PC Twitter is like, cat dancing videos and it’s like, this is why memory prices has doubled and you can’t get a new gaming GPU. — Dylan Patel @ 1:27:17

Key Topics

AI Compute Scaling · Semiconductor Supply Chain · Hyperscaler Capex · AI Lab Fundraising and Economics · GPU Value and Depreciation · Future Compute Bottlenecks · EUV Lithography and ASML · Long-term Compute Contracts · Semiconductor Manufacturing Complexity · AI Compute Supply Chain Bottlenecks · EUV Lithography · Memory Crunch (HBM vs. DDR) · Economic Impact of AI on Consumer Electronics · US vs. China Semiconductor Race · Advanced Chip Packaging · Fab Construction and Tooling · AI Infrastructure · Semiconductor Supply Chain · Bottlenecks in AI · TSMC · ASML · Power Generation for Data Centers · Data Center Construction · Labor Shortages · Space-based Computing · AI Chip Architecture · Nvidia vs Google vs Amazon · Network Topology · AI Model Scaling · Geopolitics of Technology

Takeaways

The demand for AI compute is driving a trillion-dollar annual spending spree across the supply chain, with a significant portion of capex being pre-payments for future infrastructure.
AI labs that were aggressive in securing long-term compute deals early (like OpenAI) have a significant margin and capacity advantage over more conservative labs (like Anthropic).
The value of a GPU is increasingly determined by the economic utility of the models it can run, not just its hardware depreciation schedule. As models become more valuable, older hardware can actually increase in value.
While current bottlenecks for AI scaling have been things like power and packaging (CoWoS), the ultimate, long-term bottleneck is the production capacity of the core semiconductor manufacturing equipment, specifically EUV tools from ASML.
ASML’s production of EUV tools is extremely limited and has a multi-year lead time for expansion, creating a hard ceiling on the amount of cutting-edge AI compute that can be deployed globally by 2030 (estimated at ~200 GW total).
The semiconductor supply chain, especially for leading-edge components like EUV tools and HBM, is incredibly complex, with long lead times and multiple bottlenecks.
There is a massive, growing demand for AI compute that the current supply chain is unprepared to meet, creating a ‘bullwhip effect’ where production lags far behind demand.
The AI boom is causing a ‘memory crunch,’ driving up the prices of DRAM and NAND. This will make consumer electronics like smartphones and PCs more expensive or lower in quality, potentially causing a public backlash against AI.
Simply using older process nodes or commodity memory (DDR) is not a viable solution to the supply crunch, as modern AI performance is heavily dependent on system-level co-design, including memory bandwidth, for which HBM is critical.
The primary bottleneck for increasing memory supply in the next 2-3 years is the physical lack of fab space, as new fabs take years to build and were not commissioned during the last market downturn.
The race for AI dominance has a temporal component: short AI timelines favor the US/West due to their current technological lead, while longer timelines could allow China to leverage its ability to build a vertically integrated supply chain at scale.
The primary bottleneck for AI compute in the long run is not foundry capacity (TSMC) but the specialized toolmakers (ASML) and material suppliers that the entire industry depends on.
While power and data center construction present challenges, they are solvable engineering and logistics problems with more diverse and flexible supply chains compared to semiconductors. They are not the ultimate bottleneck.
The idea of putting data centers in space to solve power issues is flawed because it ignores the primary constraint: the scarcity of chips. The long deployment time and inability to service hardware in space make it economically unviable.
There is a fundamental trade-off in AI development between training larger models and iterating faster on smaller models. In a compute-constrained world, the feedback loop from faster research and RL on smaller models often provides more value.
Different AI hardware companies (Nvidia, Google, Amazon) have made different architectural trade-offs in their scale-up domains (e.g., all-to-all vs. torus topology), which affects the types of models they can run most efficiently.
As AI becomes the dominant driver of semiconductor demand, traditional customers like Apple will become less influential, and AI companies will increasingly dictate the roadmap and capacity allocation at foundries like TSMC.