Acquired NVIDIA Part III: The Dawn of the AI Era (2022-2023)

Category: Acquired Podcast (Deep Dives) · Duration: 174 min · ▶ Watch

Speakers: Acquired Podcast (Ben & David) · Ben Gilbert and David Rosenthal

Segments (30)

00:00 · Introduction and 2022 Context
- The hosts introduce the episode focusing on NVIDIA’s role in the AI revolution following the tech market crash of 2022.
04:35 · NVIDIA’s $1 Trillion TAM
- Discussion of NVIDIA’s 2021 presentation claiming a $1 trillion total addressable market by capturing 1% of a $100 trillion industry.
07:57 · The Big Bang of AI: AlexNet
- A look back at 2012 when the AlexNet team used NVIDIA GPUs and convolutional neural networks to win the ImageNet competition.
12:40 · The Google Brain and Facebook AI Duopoly
- How Google and Facebook scooped up top AI talent to optimize narrow tasks like social media feeds and ad targeting.
18:22 · The Founding of OpenAI
- Elon Musk, Sam Altman, and Ilya Sutskever form OpenAI to prevent a big tech monopoly on Artificial General Intelligence (AGI).
23:20 · Crusoe Cloud Sponsor Read
- An ad read for Crusoe Cloud, highlighting their environmentally friendly, AI-focused cloud compute infrastructure.
25:53 · The Limitations of Early Language Models
- Explaining how early AI models were just probabilistic next-word predictors lacking true contextual understanding.
30:50 · The Transformer Breakthrough
- Google’s 2017 ‘Attention Is All You Need’ paper introduces Transformers, allowing parallel processing of language data.
41:10 · OpenAI’s Pivot and Microsoft Partnership
- OpenAI transitions to a capped-profit model to raise the massive capital required for compute, securing billions from Microsoft.
48:30 · The Release of ChatGPT
- The timeline of OpenAI’s rapid releases, culminating in ChatGPT becoming the fastest-growing consumer product in history.
1:40:00 · The CUDA Moat and Platform Strategy
- Discussion on how Nvidia’s long-term investment in CUDA created a massive developer moat, making them a platform company akin to Microsoft.
1:43:45 · Mellanox Acquisition and Networking
- Analysis of Nvidia’s strategic acquisition of Mellanox to solve the networking bandwidth bottlenecks inherent in training large AI models.
1:46:37 · Gross Margins and Market Dominance
- Exploration of Nvidia’s expanding gross margins, reaching over 70%, driven by their differentiated platform and current supply shortages.
1:48:30 · China Export Controls
- How Nvidia adapted to US export controls by creating the A800 and H800 chips with reduced networking speeds for the Chinese market.
1:50:45 · Omniverse and the Future of Simulation
- The potential of Nvidia’s Omniverse platform, combining 3D ray tracing and AI for advanced simulations and generative environments.
1:53:44 · Employee Efficiency and Culture
- Highlighting Nvidia’s incredible market cap per employee and Jensen Huang’s unique, intense leadership style.
1:57:58 · Analyzing Nvidia’s ‘Powers’
- Applying the ‘7 Powers’ framework to Nvidia, focusing on scale economies, switching costs, and cornered resources at TSMC.
2:23:00 · The Bear Case
- Discussing potential risks, including an AI hype bubble bursting and cloud providers successfully shifting workloads to their own custom silicon.
00:00 · Introduction
- Hosts introduce the episode covering Nvidia’s recent history and the AI boom.
07:50 · The AlexNet Breakthrough
- Discussion of the 2012 ImageNet competition and the use of GPUs for deep learning.
14:30 · The Founding of OpenAI
- How OpenAI was started as a non-profit to counter big tech’s AI dominance.
26:00 · The Transformer Architecture
- Google’s ‘Attention is All You Need’ paper and how it enabled parallelization.
33:00 · The Memory Wall
- Explaining the Von Neumann bottleneck and why memory bandwidth is crucial for AI.
45:00 · OpenAI and Microsoft
- OpenAI transitions to a capped-profit model and partners with Microsoft for compute.
61:00 · Mellanox Acquisition
- Nvidia acquires Mellanox to control the networking interconnects in data centers.
64:00 · Grace Hopper and Systems
- Nvidia moves beyond just GPUs to building entire supercomputing systems.
71:00 · DGX Cloud
- Nvidia’s strategy to offer AI supercomputing as a cloud service.
89:00 · The Revenue Explosion
- Nvidia’s unprecedented revenue guidance and growth driven by generative AI demand.
98:00 · Bear Cases and Competition
- Analyzing threats from hyperscalers’ custom silicon and other competitors.
130:00 · Conclusion
- Final thoughts on Nvidia’s position and the future of computing.

Specific Prices (17)

Timestamp	Item	Value	Context
08:55	Mechanical Turk image labeling	$2/hour	The estimated hourly wage paid to Amazon Mechanical Turk workers to hand-label the 14 million images in the ImageNet dataset.
10:25	Two GeForce GTX 580 GPUs	$1,000	The total cost of the consumer-grade hardware the AlexNet team bought to train their breakthrough neural network.
55:50	Microsoft investment in OpenAI (2019)	$1,000,000,000	Microsoft’s initial investment into OpenAI after it transitioned to a capped-profit structure.
57:45	Microsoft investment in OpenAI (2023)	$10,000,000,000	Microsoft’s subsequent massive investment into OpenAI to fund further compute and model training.
1:43:56	Mellanox Acquisition	$7 billion	The amount Nvidia paid in cash to acquire networking company Mellanox.
1:44:45	Training Megatron Model	$500,000	The estimated retail cost to train the 8.3 billion parameter Megatron model in 2019.
1:53:58	Microsoft Market Cap	$2.5 trillion	Mentioned as a comparison point for employee efficiency.
1:54:24	Nvidia Market Cap per Employee	$46 million	Calculated value demonstrating Nvidia’s extreme employee efficiency.
08:50	Mechanical Turk labeling	~$2/hour	Cost of labeling images for the ImageNet dataset.
10:00	GTX 580 GPU	~$500	The cost of the consumer GPUs used by the AlexNet team.
45:50	Microsoft initial investment in OpenAI	$1 Billion	Microsoft’s first major investment to provide compute resources to OpenAI.
47:40	Microsoft subsequent investment in OpenAI	$10 Billion	Microsoft’s massive follow-on investment in OpenAI.
61:00	Mellanox acquisition	$7 Billion	The price Nvidia paid to acquire networking company Mellanox.
75:30	H100 GPU	~$40,000	Approximate market price for a single Nvidia H100 GPU.
85:30	DGX Cloud instance	$37,000/month	Pricing for renting an 8x A100 instance on Nvidia’s DGX Cloud.
89:30	Nvidia Q2 2024 Revenue Guidance	$11 Billion	The shocking revenue guidance that signaled the start of the AI boom.
91:30	Nvidia Q2 2024 Actual Revenue	$13.5 Billion	The actual revenue reported, beating the already massive guidance.

Memory Facts (5)

[1:44:24] Large models require running across multiple servers and racks, placing huge importance on bandwidth between machines.
- Multiple servers, multiple racks
[1:49:52] Nvidia created the A800 and H800 chips for China by cranking down the NVLink data transfer speeds to comply with export regulations.
- NVLink data transfer speeds
[33:00] Von Neumann bottleneck limits performance because memory access speeds have not kept pace with processor speeds.
- N/A
[59:30] Nvidia H100 GPUs utilize High Bandwidth Memory (HBM) to increase data transfer rates.
- 80GB HBM per H100
[60:00] For efficient inference, the entire weights of a Large Language Model need to fit into the GPU’s VRAM.
- N/A

Bottleneck Claims (7)

[42:15] Recurrent Neural Networks (RNNs) were a bottleneck for scaling AI.
- Evidence: RNNs processed data sequentially, meaning the output of one step had to be known before the next could begin, making it impossible to utilize the massive parallel processing power of GPUs.
[53:10] Compute cost was the primary bottleneck for achieving AGI.
- Evidence: OpenAI realized that the Transformer architecture scaled incredibly well with compute, but they could not afford the necessary GPUs as a non-profit, forcing their pivot to a capped-profit model.
[1:44:18] Networking bandwidth between machines is a critical bottleneck for training large AI models.
- Evidence: Nvidia’s acquisition of Mellanox and the subsequent release of the Megatron model demonstrated the necessity of high-speed interconnects.
[1:49:52] Reducing NVLink data transfer speeds effectively bottlenecks the ability to train large models.
- Evidence: This was the specific mechanism Nvidia used to create export-compliant chips (A800/H800) for the Chinese market.
[33:00] Memory bandwidth limits inference speed.
- Evidence: Generating tokens requires constantly fetching model weights from memory, making the memory bus the limiting factor, not the FLOPS.
[61:00] Networking interconnects limit training scale.
- Evidence: Training massive models requires splitting the workload across thousands of GPUs; the speed at which they communicate (via InfiniBand/NVLink) dictates overall training time.
[68:00] Advanced packaging (CoWoS) capacity limits Nvidia’s chip supply.
- Evidence: TSMC’s limited capacity for Chip-on-Wafer-on-Substrate packaging creates a bottleneck in producing finished H100 chips.

Predictions (7)

[07:15, Long-term (Unspecified)] The internet and digital world will continue to grow, creating a new foundational layer powered entirely by NVIDIA hardware.
[1:46:31, Near to medium term] More fully owned and operated Nvidia data centers are likely to be built.
[1:47:34, Near to medium term] Nvidia’s high gross margins (65%+) will not erode significantly in the near future.
[2:06:44, 5 years] Current data center architectures and purchasing decisions will lock in customers for at least the next five years.
[2:27:57, Medium to long term (10 years)] There will be a ‘valley or trough’ in AI excitement and spending before the technology’s full transformative impact is realized.
[105:00, Next few years] Hyperscalers will shift more of their internal AI workloads to their own custom silicon to save costs.
[115:00, Long-term] Nvidia’s CUDA software moat will face increasing pressure from higher-level frameworks like PyTorch that abstract away the hardware.

Key Technologies (18)

Large Language Models (LLMs): AI models trained on vast amounts of text data to understand and generate human-like language.
Transformer: A neural network architecture that uses self-attention mechanisms to process sequential data in parallel, drastically improving training efficiency.
Convolutional Neural Networks (CNNs): A class of deep neural networks commonly used for analyzing visual imagery.
CUDA: NVIDIA’s parallel computing platform and programming model that allows developers to use GPUs for general-purpose processing.
Recurrent Neural Networks (RNNs) / LSTMs: Older neural network architectures designed for sequential data, limited by their inability to be processed in parallel.
Attention Mechanism: Allows a model to weigh the importance of different words in a sequence relative to each other, regardless of their distance apart.
Positional Encoding: A technique used in Transformers to inject information about the relative or absolute position of words in a sequence, since the model processes all words simultaneously.
CUDA: Nvidia’s parallel computing platform and programming model that allows developers to use GPUs for general-purpose processing.
Megatron: A large, transformer-based language model developed by Nvidia to demonstrate the capabilities of their hardware and networking.
Omniverse: Nvidia’s platform for building and operating custom 3D pipelines and simulating virtual worlds.
Ray Tracing: A rendering technique that simulates the physical behavior of light to produce highly realistic 3D graphics.
NVLink: A high-speed, direct GPU-to-GPU interconnect technology developed by Nvidia.
PyTorch: An open-source machine learning framework widely used for deep learning applications.
Transformers: A neural network architecture relying on self-attention, allowing for highly parallel processing of sequential data.
High Bandwidth Memory (HBM): Stacked memory chips placed very close to the GPU die to provide massive memory bandwidth.
InfiniBand: A high-throughput, low-latency computer networking communications standard used in high-performance computing.
NVLink: Nvidia’s proprietary high-speed interconnect technology that allows GPUs to communicate directly with each other.
CoWoS (Chip-on-Wafer-on-Substrate): An advanced packaging technology by TSMC used to integrate multiple chips (like GPU and HBM) onto a single substrate.

Companies Mentioned (23)

NVIDIA · OpenAI · Microsoft · Google · Facebook (Meta) · DeepMind · Snap · ByteDance (TikTok) · Crusoe Cloud · Tesla · Waymo · Statsig · Cisco · Intel · IBM · Mellanox · Baidu · Apple · AMD · TSMC · Amazon · Meta · Amazon (AWS)

Notable Quotes (7)

Attention is all you need. — Google Brain Team (Paper Title) @ 31:05

The AI heard around the world. — Jensen Huang (paraphrased by hosts) @ 47:35

The right analogy for Nvidia also is Microsoft. They make the operating system, they make the programming environment, they make many of the applications. — David @ 1:42:14

I relax all the time. I enjoy relaxing at work because work is relaxing for me. Solving problems is relaxing for me. Achieving something is relaxing for me. — Jensen Huang (quoted by Ben) @ 1:56:11

You build a great company by doing things that other people can’t do. You don’t build a company by fighting other people to do things that everyone can do. — Jensen Huang (quoted by Ben) @ 2:18:17

The more you buy, the more you save. — Jensen Huang (quoted by hosts) @ 78:00

The data center is the new unit of computing. — Ben Gilbert @ 138:00

Key Topics

NVIDIA's pivot to AI · The history of deep learning (AlexNet) · The founding and evolution of OpenAI · The Transformer architecture breakthrough · The compute bottleneck in AI research · Microsoft's strategic partnership with OpenAI · Nvidia's Platform Strategy · CUDA Developer Ecosystem · Data Center Networking (Mellanox) · Gross Margin Expansion · Geopolitics and Export Controls · Omniverse and 3D Simulation · Corporate Culture and Efficiency · Competitive Advantages (7 Powers) · AI Market Dynamics and Bear Cases · Nvidia's transition to a data center company · The impact of the Transformer architecture · OpenAI's history and Microsoft partnership · Memory bandwidth and interconnect bottlenecks · Hyperscaler competition and custom silicon

Takeaways

NVIDIA’s current dominance is the result of a decade-long bet on parallel computing (CUDA) intersecting perfectly with the computational needs of deep learning.
The Transformer architecture was the critical unlock that allowed AI training to be parallelized across thousands of GPUs, breaking the sequential bottleneck of older models.
OpenAI’s transition from a non-profit to a capped-profit company was a structural necessity driven by the sheer cost of the compute required to scale Transformer models.
Nvidia’s dominance is built on a decade-long investment in the CUDA software platform, making them a platform company rather than just a chip designer.
The acquisition of Mellanox was crucial for Nvidia to solve the networking bottlenecks required for training massive AI models across multiple servers.
Nvidia enjoys unprecedented gross margins (70%+) and employee efficiency, reflecting their strong market position and pricing power.
Despite US export controls, Nvidia successfully adapted by creating ‘nerfed’ chips (A800/H800) that remain highly demanded in China.
Nvidia’s competitive moat is reinforced by high switching costs, scale economies in software development, and secured capacity at TSMC.
Nvidia’s current dominance is the result of long-term bets on CUDA and data center architecture, not just a sudden AI boom.
The bottleneck in AI has shifted from pure compute (FLOPS) to memory bandwidth and networking.
While Nvidia currently holds a near-monopoly on AI training hardware, hyperscalers are heavily incentivized to develop competing custom silicon.