Kimi Founder Yang Zhilin: K2, Agentic LLMs, Beginning of Infinity

Duration: 100 min · ▶ Watch on YouTube

Guest: Yang Zhilin (杨植麟) · Founder and CEO of Moonshot AI

Switch → 中文

Chapters (10)

  • 00:00 · Reflections on the First Year of Entrepreneurship
    • Yang Zhilin reflects on the rapid progress of AI models and compares the journey to climbing an endless, unknown snow-capped mountain.
  • 07:25 · Defining AGI and the Evolution of Work
    • Discussion on AGI as a direction rather than a fixed point, and how it will eventually make everyone a ‘superman’.
  • 09:15 · Reasoning Models and Test-Time Scaling
    • Exploration of long-thinking reasoning models, the process of proposing and verifying conjectures, and the importance of test-time scaling.
  • 15:00 · Scaffolding and Context Engineering
    • The role of scaffolding and context engineering in maximizing a model’s capabilities and solving complex tasks.
  • 23:55 · Key Decisions: From SFT to RL
    • Yang outlines the shift in research focus from Supervised Fine-Tuning (SFT) in 2023-2024 to Reinforcement Learning (RL) in 2024-2025.
  • 27:15 · Optimizers and Token Efficiency
    • The introduction of the Muon optimizer to improve token efficiency compared to the traditional Adam optimizer.
  • 40:15 · Agent Generalization and Tool Use
    • Challenges in agent generalization, the transition from specific tool environments to general-purpose tool use, and the limitations of current benchmarks.
  • 53:15 · Open Source vs. Closed Source and Commercialization
    • Perspectives on the open-source ecosystem, market integration, and the commercialization potential of AI products.
  • 70:45 · The K2 Model and Future Challenges
    • Discussion on the K2 model, the necessity of continuous iteration, and the ultimate goal of building a world model.
  • 89:35 · AI as a Meta-Science and Personal Philosophy
    • Yang shares his view of AI becoming a meta-science, the inevitable progress of technology, and his personal reflections on creation and meaning.

Specific Numbers (5)

Time Fact Value Context
01:45 Model progress Two years ago It was hard to imagine the current capabilities of models two years ago.
07:45 AGI capability 99% AGI might do things better than 99% of humans.
23:55 Key decisions timeline 2023-2024 SFT was the focus of the research paradigm.
24:05 Key decisions timeline 2024-2025 The focus shifted to Reinforcement Learning (RL).
27:25 Optimizer usage 10 years The Adam optimizer has been used for ten years.

Research Claims & Predictions (5)

  • [13:15] Test-time scaling is crucial for effective reasoning.
    • evidence: It allows models to propose conjectures, verify them, and fix bugs iteratively, leading to better answers.
  • [27:15] The Muon optimizer significantly improves token efficiency.
    • evidence: It allows models to absorb data faster, making one data point equal to two for others, though it presents stability challenges during training.
  • [43:45] Agent generalization is the next major challenge.
    • evidence: Current agents struggle with out-of-distribution (OOD) scenarios and require better on-policy sampling and RL to improve.
  • [82:35] Building a world model is equivalent to creating a world.
    • evidence: A good world model will have a higher ceiling and more knowledge, acting similarly to a reinforcement learning process.
  • [91:05] AI will become a meta-science.
    • evidence: It will take decades, but AI will eventually drive the progress of other scientific fields.

Key Concepts (7)

  • [07:25] AGI (Artificial General Intelligence)
    • Described not as a specific endpoint, but as a continuous direction of improvement where AI eventually surpasses most human capabilities.
  • [13:15] Test-time scaling
    • The process of allowing a model to spend more compute time during inference to iteratively refine, verify, and polish its answers.
  • [15:05] Scaffolding
    • External structures or tools built around a model to help it perform complex tasks or utilize tools it couldn’t handle natively.
  • [15:45] Context Engineering
    • Designing the input context and methods to guide the model’s logic and behavior effectively.
  • [27:15] Muon Optimizer
    • An alternative to the Adam optimizer that improves token efficiency and learning speed, though it is harder to stabilize.
  • [82:35] World Model
    • A model that understands and simulates the rules of the world, providing a higher ceiling for intelligence and knowledge.
  • [91:05] Meta-science
    • The concept that AI will become a foundational science that accelerates and enables discoveries in all other scientific domains.

People Mentioned (3)

  • Yang Zhilin — Founder and CEO of Moonshot AI, the interviewee.
  • Xiaojun — The host conducting the interview.
  • Isaac Newton — Mentioned metaphorically to explain how theories and models need continuous adjustment and explanation.

Companies Mentioned (4)

Moonshot AI · Anthropic · OpenAI · ByteDance

Notable Quotes (4)

A place full of snow… Like the paradigm of reinforcement learning. — Yang Zhilin @ 02:20

But AGI is what you keep doing. — Yang Zhilin @ 08:25

AI will become a meta-science. — Yang Zhilin @ 91:05

In the end, the progress of technology is inevitable. — Yang Zhilin @ 91:45

Career Arc & Personal Stories (2)

  • [00:00] Yang Zhilin reflects on his first year of entrepreneurship with Moonshot AI, describing the journey as climbing an endless, unknown snow-capped mountain where new problems constantly arise but are ultimately solvable.
  • [92:25] Yang discusses his personal philosophy, emphasizing the importance of human experience and creative work, and how AI’s progress, while inevitable, should aim to help people live better lives.

Tools & Models Discussed (8)

  • Kimi: Moonshot AI’s primary conversational AI product.
  • Claude: Anthropic’s AI model, referenced for its reasoning capabilities and coding tools.
  • K1.5: A model mentioned as following OpenAI’s trajectory, likely an internal or upcoming iteration.
  • K2: Moonshot AI’s base model, noted for performing very well and serving as a foundation for further scaling and multimodal capabilities.
  • Cursor: An AI-powered code editor mentioned in the context of coding agents.
  • Claude Code: An agentic coding tool by Anthropic.
  • Adam: A widely used optimization algorithm for training deep learning models.
  • Muon: A newer optimizer used by Moonshot to improve token efficiency during model training.

Topics

AGI Development · Reinforcement Learning (RL) · Test-Time Scaling · Agent Generalization · Model Optimization (Muon vs. Adam) · Open Source vs. Closed Source AI · World Models · AI Commercialization

Takeaways

  • The journey to AGI is like climbing an endless mountain; it’s a continuous process of solving new, complex problems.
  • Test-time scaling and reinforcement learning are currently the most critical areas for advancing model reasoning capabilities.
  • Improving token efficiency through new optimizers like Muon is essential, despite the engineering challenges they present.
  • Generalizing AI agents to handle out-of-distribution tasks without relying heavily on scaffolding is the next major hurdle.
  • AI is on a trajectory to become a ‘meta-science’ that will fundamentally accelerate all other fields of human knowledge.