Kimi Founder Yang Zhilin: K2, Agentic LLMs, Beginning of Infinity

Duration: 100 min · ▶ Watch on YouTube

Guest: Yang Zhilin (杨植麟) · Founder and CEO of Moonshot AI

Chapters (10)

00:00 · Reflections on the First Year of Entrepreneurship
- Yang Zhilin reflects on the rapid progress of AI models and compares the journey to climbing an endless, unknown snow-capped mountain.
07:25 · Defining AGI and the Evolution of Work
- Discussion on AGI as a direction rather than a fixed point, and how it will eventually make everyone a ‘superman’.
09:15 · Reasoning Models and Test-Time Scaling
- Exploration of long-thinking reasoning models, the process of proposing and verifying conjectures, and the importance of test-time scaling.
15:00 · Scaffolding and Context Engineering
- The role of scaffolding and context engineering in maximizing a model’s capabilities and solving complex tasks.
23:55 · Key Decisions: From SFT to RL
- Yang outlines the shift in research focus from Supervised Fine-Tuning (SFT) in 2023-2024 to Reinforcement Learning (RL) in 2024-2025.
27:15 · Optimizers and Token Efficiency
- The introduction of the Muon optimizer to improve token efficiency compared to the traditional Adam optimizer.
40:15 · Agent Generalization and Tool Use
- Challenges in agent generalization, the transition from specific tool environments to general-purpose tool use, and the limitations of current benchmarks.
53:15 · Open Source vs. Closed Source and Commercialization
- Perspectives on the open-source ecosystem, market integration, and the commercialization potential of AI products.
70:45 · The K2 Model and Future Challenges
- Discussion on the K2 model, the necessity of continuous iteration, and the ultimate goal of building a world model.
89:35 · AI as a Meta-Science and Personal Philosophy
- Yang shares his view of AI becoming a meta-science, the inevitable progress of technology, and his personal reflections on creation and meaning.

Specific Numbers (5)

Time	Fact	Value	Context
01:45	Model progress	Two years ago	It was hard to imagine the current capabilities of models two years ago.
07:45	AGI capability	99%	AGI might do things better than 99% of humans.
23:55	Key decisions timeline	2023-2024	SFT was the focus of the research paradigm.
24:05	Key decisions timeline	2024-2025	The focus shifted to Reinforcement Learning (RL).
27:25	Optimizer usage	10 years	The Adam optimizer has been used for ten years.

Research Claims & Predictions (5)

[13:15] Test-time scaling is crucial for effective reasoning.
- evidence: It allows models to propose conjectures, verify them, and fix bugs iteratively, leading to better answers.
[27:15] The Muon optimizer significantly improves token efficiency.
- evidence: It allows models to absorb data faster, making one data point equal to two for others, though it presents stability challenges during training.
[43:45] Agent generalization is the next major challenge.
- evidence: Current agents struggle with out-of-distribution (OOD) scenarios and require better on-policy sampling and RL to improve.
[82:35] Building a world model is equivalent to creating a world.
- evidence: A good world model will have a higher ceiling and more knowledge, acting similarly to a reinforcement learning process.
[91:05] AI will become a meta-science.
- evidence: It will take decades, but AI will eventually drive the progress of other scientific fields.

Key Concepts (7)

[07:25] AGI (Artificial General Intelligence)
- Described not as a specific endpoint, but as a continuous direction of improvement where AI eventually surpasses most human capabilities.
[13:15] Test-time scaling
- The process of allowing a model to spend more compute time during inference to iteratively refine, verify, and polish its answers.
[15:05] Scaffolding
- External structures or tools built around a model to help it perform complex tasks or utilize tools it couldn’t handle natively.
[15:45] Context Engineering
- Designing the input context and methods to guide the model’s logic and behavior effectively.
[27:15] Muon Optimizer
- An alternative to the Adam optimizer that improves token efficiency and learning speed, though it is harder to stabilize.
[82:35] World Model
- A model that understands and simulates the rules of the world, providing a higher ceiling for intelligence and knowledge.
[91:05] Meta-science
- The concept that AI will become a foundational science that accelerates and enables discoveries in all other scientific domains.

People Mentioned (3)

Yang Zhilin — Founder and CEO of Moonshot AI, the interviewee.
Xiaojun — The host conducting the interview.
Isaac Newton — Mentioned metaphorically to explain how theories and models need continuous adjustment and explanation.

Companies Mentioned (4)

Moonshot AI · Anthropic · OpenAI · ByteDance

Notable Quotes (4)

A place full of snow… Like the paradigm of reinforcement learning. — Yang Zhilin @ 02:20

But AGI is what you keep doing. — Yang Zhilin @ 08:25

AI will become a meta-science. — Yang Zhilin @ 91:05

In the end, the progress of technology is inevitable. — Yang Zhilin @ 91:45

Career Arc & Personal Stories (2)

[00:00] Yang Zhilin reflects on his first year of entrepreneurship with Moonshot AI, describing the journey as climbing an endless, unknown snow-capped mountain where new problems constantly arise but are ultimately solvable.
[92:25] Yang discusses his personal philosophy, emphasizing the importance of human experience and creative work, and how AI’s progress, while inevitable, should aim to help people live better lives.

Tools & Models Discussed (8)

Kimi: Moonshot AI’s primary conversational AI product.
Claude: Anthropic’s AI model, referenced for its reasoning capabilities and coding tools.
K1.5: A model mentioned as following OpenAI’s trajectory, likely an internal or upcoming iteration.
K2: Moonshot AI’s base model, noted for performing very well and serving as a foundation for further scaling and multimodal capabilities.
Cursor: An AI-powered code editor mentioned in the context of coding agents.
Claude Code: An agentic coding tool by Anthropic.
Adam: A widely used optimization algorithm for training deep learning models.
Muon: A newer optimizer used by Moonshot to improve token efficiency during model training.

Topics

AGI Development · Reinforcement Learning (RL) · Test-Time Scaling · Agent Generalization · Model Optimization (Muon vs. Adam) · Open Source vs. Closed Source AI · World Models · AI Commercialization

Takeaways

The journey to AGI is like climbing an endless mountain; it’s a continuous process of solving new, complex problems.
Test-time scaling and reinforcement learning are currently the most critical areas for advancing model reasoning capabilities.
Improving token efficiency through new optimizers like Muon is essential, despite the engineering challenges they present.
Generalizing AI agents to handle out-of-distribution tasks without relying heavily on scaffolding is the next major hurdle.
AI is on a trajectory to become a ‘meta-science’ that will fundamentally accelerate all other fields of human knowledge.