Dwarkesh + Andrej Karpathy: Summoning Ghosts
Category: Expert Interviews · Duration: 146 min · ▶ Watch
Speakers: Andrej Karpathy · Dwarkesh Patel
Segments (14)
- 00:00 · Intro
- Teaser clips and introduction to the episode.
- 00:48 · The Decade of Agents
- Karpathy explains why we are entering a decade of AI agents rather than a single year.
- 04:04 · Seismic Shifts in AI
- Discussion on major historical shifts in AI, including AlexNet and RL in Atari games.
- 07:51 · Evolution vs. Pre-training
- Comparing biological evolution to the pre-training of language models.
- 14:40 · In-Context Learning
- Exploring the mechanics of in-context learning and its relation to gradient descent.
- 18:00 · Memory and Compression
- Analyzing how models compress information into weights versus storing it in the KV cache.
- 20:00 · Brain Analogies
- Comparing current AI architectures to different parts of the human brain.
- 27:30 · NanoGPT and Building from Scratch
- The value of building models from scratch to deeply understand them.
- 35:11 · Automating AI Engineering
- The challenges and implications of AI automating the work of AI researchers and engineers.
- 40:54 · The Flaws of Reinforcement Learning
- Karpathy argues that RL is a terrible learning algorithm compared to human learning.
- 56:00 · Model Collapse
- The dangers of training models on synthetic data generated by other models.
- 01:05:43 · AI in Education
- How AI tutors will revolutionize learning by providing personalized, perfect instruction.
- 01:15:11 · Self-Driving Cars
- Comparing the progress and approaches of Tesla and Waymo in autonomous driving.
- 01:22:30 · Superintelligence
- Discussing the trajectory towards AGI and whether it will be a gradual or sudden shift.
Specific Prices (1)
| Timestamp | Item | Value | Context |
|---|---|---|---|
| 27:30 | NanoGPT | $100 | The cost to build NanoGPT as shown on the GitHub repo. |
Memory Facts (2)
- [18:00] Llama 3 70B model weights represent a specific amount of information per pre-training token.
- 0.075 bits/token
- [18:20] The KV cache size grows significantly per additional token in context.
- 320 kB (2.56 million bits) per token
Bottleneck Claims (3)
- [01:48] Current AI agents are bottlenecked by a lack of continual learning and multi-modality.
- Evidence: They cannot remember past interactions effectively or interact with the world using vision and action seamlessly.
- [35:11] Automating AI engineering is a major bottleneck to an intelligence explosion.
- Evidence: AI models currently struggle with the complex, long-horizon tasks required to write novel AI code and conduct research.
- [01:06:00] AI collaboration is bottlenecked by the lack of a shared ‘culture’.
- Evidence: Unlike humans who share knowledge through culture and artifacts, LLMs do not have a persistent shared environment to build upon each other’s work.
Predictions (3)
- [01:35, 10 years] It will take about a decade to fully realize capable AI agents.
- [01:05:43, Near future] AI tutors will become the primary way people learn, offering perfect, personalized instruction.
- [01:11:00, Long term] The path to AGI will be a gradual automation of tasks, not a sudden ‘sharp left turn’.
Key Technologies (4)
- Deep Learning: A subset of machine learning based on artificial neural networks.
- Reinforcement Learning (RL): Training models to make sequences of decisions by rewarding desired behaviors.
- In-Context Learning: The ability of a model to learn from the prompt provided at inference time without updating its weights.
- KV Cache: Memory used by transformers to store key and value vectors for past tokens to speed up generation.
Companies Mentioned (5)
OpenAI · DeepMind · Tesla · Waymo · Google
Notable Quotes (3)
Reinforcement learning is terrible. It just so happens that everything that we had before it is much worse. — Andrej Karpathy @ 00:00
We’re not actually building animals. We’re building ghosts. — Andrej Karpathy @ 09:24
Humans don’t use reinforcement learning. — Andrej Karpathy @ 41:35
Key Topics
AI Agents · Reinforcement Learning · In-Context Learning · AI Education · Model Compression · Self-Driving Cars · AGI Timelines
Takeaways
- Developing fully autonomous AI agents will likely take a decade of iterative improvements rather than happening overnight.
- Current AI training methods (pre-training) are fundamentally different from biological evolution, creating ‘ghosts’ rather than ‘animals’.
- Reinforcement learning is highly inefficient compared to human learning and is often misapplied in AI development.
- Building models from scratch (like NanoGPT) is crucial for deeply understanding how they work.
- AI has the potential to revolutionize education by providing personalized, infinitely patient tutors.