Li Auto Founder Li Xiang: MoE, Liang Wenfeng, VLA, Memory, Intelligence

Duration: 164 min · ▶ Watch on YouTube

Guest: Li Xiang · Founder of Li Auto

Switch → 中文

Chapters (30)

  • 00:00 · AI as an Information Tool vs. Production Tool
    • Li Xiang discusses the limitations of current AI as merely an information tool and the need for it to become an action-oriented production tool.
  • 11:40 · Li Auto’s Progress and the Impact of DeepSeek
    • Reflecting on the past 130 days, Li highlights advancements in AI models, specifically praising DeepSeek’s open-source contributions and efficiency.
  • 24:00 · VLA Architecture and Agent OS
    • The conversation shifts to Li Auto’s development of Vision-Language-Action (VLA) models and the concept of an Agent OS for autonomous systems.
  • 34:30 · Training VLA for Autonomous Driving
    • Li details the multi-stage training process for VLA, comparing it to human learning from driving school to real-world experience.
  • 48:30 · The Future of AI Terminals and AGI
    • Discussion on what constitutes an AGI terminal, comparing future AI devices to the evolution of smartphones and PCs.
  • 62:00 · Organizational Evolution in the AI Era
    • Li explains how Li Auto’s organizational structure is adapting to integrate AI, emphasizing the need for a ‘System 2’ thinking approach.
  • 76:00 · Leadership, Growth, and Facing Challenges
    • Personal reflections on leadership, the importance of facing reality, and building a resilient, learning-oriented organization.
  • 90:00 · The Relationship Between Humans and AI
    • A philosophical look at how AI will augment human capabilities rather than replace them, focusing on human-AI collaboration.
  • 50:00 · Agent OS and Autonomous Driving
    • Discussing the transition from rule-based systems to Agent OS and VLA models for autonomous driving.
  • 60:00 · Challenges of VLA and World Models
    • Exploring the complexities of VLA models, the necessity of world models for simulation, and validation costs.
  • 75:00 · Impact of DeepSeek and Reinforcement Learning
    • Analyzing how DeepSeek’s language capabilities and reinforcement learning are accelerating AI development.
  • 85:00 · Strategic Vision for Smart Vehicles
    • Outlining the strategic shift towards AI-driven vehicles and the importance of scale and user base.
  • 95:00 · Building an Agent OS
    • Detailing the architecture and requirements for a robust Agent OS to manage AI tasks.
  • 105:00 · AI as a Production Tool
    • Differentiating between AI as an information tool and a production tool that requires action capabilities.
  • 115:00 · Organizational Changes for AI
    • Discussing how companies must adapt their organizational structures to effectively leverage AI.
  • 130:00 · The Future of Edge AI and VLA
    • Predicting the deployment of large VLA models on edge devices in vehicles.
  • 150:00 · Human-AI Interaction and Trust
    • Emphasizing the importance of building trust and effective interaction between humans and AI systems.
  • 00:00 · Vision for 2030 and the Impact of DeepSeek
    • Li Xiang discusses his vision for Li Auto to become a leading AI terminal company by 2030 and praises DeepSeek’s open-source contributions.
  • 04:00 · From Information Tools to Production Tools
    • The conversation shifts to the limitations of current AI as mere information tools and the necessity for AI to become production tools that take action.
  • 13:00 · DeepSeek’s Best Practices and VLA Models
    • Li Xiang analyzes DeepSeek’s success, emphasizing their research methodology, and introduces the concept of VLA (Vision-Language-Action) models for autonomous driving.
  • 26:00 · Agent OS and the Future of Operating Systems
    • Discussion on the need for an Agent OS to manage multiple AI agents and the strategic importance of controlling the operating system.
  • 36:00 · Organizational Evolution and Leadership
    • Li Xiang reflects on how AI is changing organizational structures and the role of leadership in adapting to these changes.
  • 48:00 · Human-AI Relationship and Personal Reflections
    • A philosophical discussion on the relationship between humans and AI, emphasizing that AI should serve humans and not replace human empathy.
  • 60:00 · Critique of AI Hardware Startups
    • Li Xiang critiques current AI hardware startups like Rabbit, arguing they lack the necessary foundation and are merely wrappers.
  • 00:00 · Vision for 2030 and Human vs. AI Capabilities
    • Discussing the goal to be a leading AI terminal company and how humans should focus on entropy reduction.
  • 05:00 · AI as a Production Tool
    • Analyzing why most current AI are just information tools and defining true production tools that take action.
  • 15:00 · DeepSeek’s Impact and Open Source
    • Praising DeepSeek’s methodology and how open-source models accelerate development.
  • 25:00 · Autonomous Driving and VLA Models
    • Detailing the shift towards Vision-Language-Action models for autonomous driving and in-car agents.
  • 40:00 · Organizational Evolution for AI
    • Explaining how company structures must adapt to AI, moving away from traditional management.
  • 150:00 · Philosophy of AI, Wisdom, and Humanity
    • Reflecting on AI alignment, the distinction between intelligence and wisdom, and the future human-AI relationship.

Specific Numbers (21)

Time Fact Value Context
01:30 Llama 4 training data scale 30T tokens Discussing the massive scale of data required for the latest language models.
02:30 Time elapsed 130 days The time passed since the last AI Talk interview.
13:30 DeepSeek V3 parameter size 671 Billion Highlighting the scale of DeepSeek’s MoE model.
42:30 VLA model parameter size 3.2B to 4B The size of the Vision-Language-Action model being developed for edge devices.
63:00 Simulation testing cost reduction From 180,000 RMB to 4,000 RMB per 10,000 km The dramatic cost reduction achieved through new AI simulation and validation methods.
63:50 Validation cost 4000 RMB The cost of validating a model has dropped significantly.
82:50 Edge model size 32B parameters The goal is to run a 32B parameter model directly on the vehicle’s edge computing platform.
83:00 Cloud model size 300B parameters A 300B parameter VLA model is used for cloud-based training and simulation.
84:40 Company revenue 100 billion RMB Li Auto’s projected revenue for the year.
101:10 Context window 1 million tokens Current large language models can support up to 1 million tokens of context.
113:30 Model size 671B parameters Referring to the size of the DeepSeek V3 model.
00:00 Vision for 2030 2030 Goal to become a leading global AI terminal enterprise.
01:10 Pre-training data scale 30T Mentioned in the context of Llama 4’s data scale.
02:20 Time since last AI talk 130 days The time elapsed since their last major discussion on AI.
13:30 DeepSeek V3 parameters 671 billion The parameter size of the DeepSeek V3 MOE model.
26:00 Cost savings from open source Hundreds of millions RMB The estimated cost saved by using open-source models like DeepSeek.
58:00 Li ONE release year 2018 The year the Li ONE was first released.
00:00 Target year 2030 Goal to become a global leading AI terminal enterprise.
01:30 Data scale 30T Amount of data used for training models like Llama 4.
02:30 Observation timeframe 130 days Timeframe for observing significant progress in Chinese AI.
13:30 Model parameters 671 billion Parameters of the DeepSeek V3 model.

Research Claims & Predictions (11)

  • [08:00] AI must transition from an information tool to a production tool to realize its true value.
    • evidence: Current AI mostly provides information (like search), but true ‘Agents’ must take action and complete tasks to be considered production tools.
  • [14:20] VLA (Vision-Language-Action) models are the key to solving complex autonomous driving and robotics.
    • evidence: VLA allows systems to understand the physical world and take direct actions, moving beyond simple perception.
  • [48:30] The ultimate AGI terminal will not just be a software app, but a deeply integrated hardware and OS ecosystem.
    • evidence: Comparing future AGI devices to the iPhone, requiring a dedicated ‘Agent OS’ to manage sensors, compute, and actions.
  • [73:00] VLA models can solve full autonomous driving.
    • evidence: The integration of vision, language, and action provides the necessary reasoning and execution capabilities.
  • [88:00] Vehicles will become the ultimate AI terminals.
    • evidence: Vehicles have the necessary power, compute, and sensors to fully utilize AGI.
  • [108:00] AI must transition from information tools to production tools.
    • evidence: To create real value, AI must be able to take actions in the physical or digital world, not just provide text.
  • [04:30] AI must transition from being an information tool to a production tool.
    • evidence: Current AI only provides information; true value lies in AI taking actions (Agents) to improve productivity.
  • [15:00] VLA (Vision-Language-Action) models are the future of autonomous driving.
    • evidence: VLA models can process 3D vision and language to directly output driving actions, replacing traditional modular systems.
  • [27:00] An Agent OS is required to manage the proliferation of AI agents.
    • evidence: As agents become more common, a dedicated OS is needed to coordinate them, similar to how iOS/Android manage apps.
  • [08:00] For an AI to be a production tool, it must have ‘action’ (执行); knowing is not enough, it must do.
    • evidence: Current industry shift towards agentic workflows.
  • [156:00] AI’s intelligence will increase infinitely, but human wisdom, which deals with relationships and values, remains distinct.
    • evidence: Long-term philosophical view on human-AI coexistence.

Key Concepts (16)

  • [14:20] VLA (Vision-Language-Action)
    • A multimodal AI architecture that processes visual and linguistic inputs to directly output physical actions, crucial for robotics and autonomous driving.
  • [07:20] Agent
    • An AI system that doesn’t just answer questions but can autonomously plan, use tools, and execute actions to achieve a specific goal.
  • [28:50] Agent OS
    • An operating system designed specifically for AI Agents, allowing them to interface with hardware, sensors, and other software seamlessly.
  • [37:00] System 1 vs. System 2
    • Cognitive framework where System 1 is fast, intuitive, and reactive (like basic driving), while System 2 is slow, deliberate, and reasoning-based (like navigating complex, novel scenarios).
  • [52:20] Agent OS
    • An operating system designed to manage, schedule, and execute tasks using AI agents.
  • [56:40] VLA (Vision-Language-Action)
    • A model architecture that processes visual and linguistic inputs to generate direct actions, crucial for robotics and autonomous driving.
  • [63:40] World Model
    • A simulation system that models the physical world, used to train and validate AI agents safely and efficiently.
  • [108:00] Production Tool vs. Information Tool
    • The distinction between AI that only provides information (like a chatbot) and AI that can execute tasks and create tangible value.
  • [04:30] Production Tool vs. Information Tool
    • Information tools provide data (like search engines or basic LLMs), while production tools (Agents) take actions and complete tasks autonomously.
  • [13:30] MOE (Mixture of Experts)
    • A neural network architecture where only a subset of the network is activated for a given input, improving efficiency and reducing compute costs.
  • [15:00] VLA (Vision-Language-Action)
    • An AI model architecture that integrates visual perception, language understanding, and action generation, crucial for embodied AI like autonomous driving.
  • [27:00] Agent OS
    • An operating system designed specifically to manage, coordinate, and provide resources for various AI agents running on a device.
  • [05:00] Production Tool (生产工具)
    • An AI that users are willing to pay for because it executes actions and creates tangible value, unlike mere information tools.
  • [26:30] VLA (Vision-Language-Action)
    • A model architecture integrating visual perception, language understanding, and physical action execution, key for robotics and driving.
  • [01:10] Entropy Reduction (商减)
    • The human capability to simplify and extract meaning from complex information, contrasted with AI’s ability to process massive data.
  • [45:00] Agent OS
    • An operating system designed to manage and coordinate various AI agents to perform complex tasks on a device.

People Mentioned (11)

  • Liang Wenfeng — Founder of DeepSeek, praised by Li Xiang for his dedication to best practices and open-source.
  • Chen Wei — A colleague at Li Auto, mentioned regarding the firm decision to pursue end-to-end VLA models.
  • Xie Yan — CTO of Li Auto, mentioned as part of the core leadership team supporting the company’s AI transition.
  • Lu Qi — Mentioned as asking a profound question about AI and human understanding.
  • Li Xiang — CEO of Li Auto, mentioned in the context of company leadership and strategy.
  • Fan Zheng — Mentioned as a key colleague and support system within the company.
  • Li Feifei — Mentioned regarding her research on spatial intelligence and its application to autonomous driving.
  • Shen Yanan — Former president of Li Auto, part of the core leadership team.
  • Ma Donghui — President of Li Auto, part of the core leadership team.
  • Li Tie — CFO of Li Auto, part of the core leadership team.
  • Founder of DeepSeek — Acknowledged for his background in AI at Zhejiang University.

Companies Mentioned (13)

DeepSeek · Manus · Tesla · OpenAI · Apple · Microsoft · Li Auto · BBA (Benz, BMW, Audi) · Toyota · Google · Rabbit · Li Auto (理想汽车) · Qwen

Notable Quotes (12)

人类要去做商减,而不要去做商增。 — Li Xiang @ 01:10

没有一个AI产品满足了生产工具的条件,它只是一个辅助工具。 — Li Xiang @ 08:00

只有坚守最佳实践,才能让我们更加敬佩他。 — Li Xiang @ 20:50

We only want to make it a real driver. — Li Auto Executive @ 50:40

Because you say the best practice for humans is often anti-human. — Li Auto Executive @ 59:50

If you want to be a production tool, you must have action. — Li Auto Executive @ 108:00

If an AI product doesn’t become a production tool, it’s just an information tool. And users won’t pay for an information tool. — Li Xiang @ 04:30

DeepSeek’s best practice is that they do research first, then development. They don’t just rush into coding. — Li Xiang @ 13:50

AI should serve humans, not replace them. The ultimate value of AI is to reduce human energy consumption. — Li Xiang @ 48:00

人类要做商减,而不要去做商增。 — Guest @ 01:10

不能只是知,必须行。 — Guest @ 08:50

AI的智能在无限增强。 — Guest @ 156:00

Career Arc & Personal Stories (6)

  • [76:00] Li Xiang reflects on the challenges of organizational management, noting that as the company scales, the leadership must evolve from relying on intuition to building robust, learning-driven systems (System 2 thinking) to handle complexity.
  • [57:50] The guest reflects on starting autonomous driving research in 2021 and the evolution of their approach over the years.
  • [114:00] Discusses the internal dynamics and support system among the leadership team at Li Auto, highlighting the importance of mutual trust.
  • [36:00] Li Xiang discusses the evolution of his leadership style and the organizational structure of Li Auto, emphasizing the shift from a traditional hierarchy to a more AI-integrated, matrix-style organization.
  • [56:00] He reflects on the early days of Li Auto, the challenges they faced, and how the core team (including Shen Yanan, Ma Donghui, and Li Tie) supported each other through difficult times.
  • [00:00] The guest outlines his long-term vision for his company to evolve into a world-leading AI terminal enterprise by 2030.

Tools & Models Discussed (16)

  • DeepSeek V3 / R1: Open-source large language and reasoning models that significantly lowered training and inference costs while achieving state-of-the-art performance.
  • Llama 4: Upcoming large language model from Meta, noted for its massive 30T token training data scale.
  • Cursor: An AI-powered code editor mentioned as an example of a tool moving towards being a true production Agent.
  • Manus: An AI Agent designed to autonomously execute complex tasks.
  • Agent OS: Manages and orchestrates AI agents to perform complex tasks.
  • VLA Models: Combines vision, language, and action to enable autonomous systems to interact with the physical world.
  • DeepSeek V3: A large language model noted for its high performance and open-source availability.
  • DeepSeek V3: A highly efficient Mixture of Experts (MOE) large language model.
  • DeepSeek R1: A reasoning-focused AI model.
  • Cursor: An AI-powered code editor used by developers to increase productivity.
  • OpenAI Deep Research: An AI tool designed for deep, autonomous research tasks.
  • DeepSeek V3: A highly efficient MoE model with 671B parameters.
  • DeepSeek R1: A reasoning-focused model utilizing reinforcement learning.
  • Llama 4: Referenced regarding training on massive 30T datasets.
  • Cursor: Cited as one of the few true AI production tools currently available for coding.
  • OpenAI Deep Research: Cited as an example of an AI production tool.

Topics

Artificial General Intelligence (AGI) · Autonomous Driving · Vision-Language-Action (VLA) Models · AI Agents and Agent OS · Organizational Management and Evolution · Open Source AI Ecosystem · Autonomous Driving · Agent OS · VLA (Vision-Language-Action) Models · World Models and Simulation · Organizational Adaptation to AI · AI as a Production Tool · Artificial General Intelligence (AGI) · Autonomous Driving · AI Agents and Agent OS · Open Source AI Models (DeepSeek) · Organizational Management in the AI Era · Human-AI Interaction · AI as a Production Tool · Vision-Language-Action (VLA) Models · Autonomous Driving Evolution · Organizational Adaptation to AI · Human vs. AI Capabilities

Takeaways

  • For AI to truly revolutionize industries, it must evolve from an ‘information tool’ that just answers questions into a ‘production tool’ (Agent) that can take autonomous actions.
  • Vision-Language-Action (VLA) models represent the next major leap for autonomous driving and robotics, enabling systems to understand and interact with the physical world.
  • DeepSeek’s open-source strategy and technical efficiency have significantly accelerated the AI industry, benefiting companies like Li Auto in their R&D.
  • To successfully integrate AI, companies must upgrade their organizational structures, shifting from intuitive ‘System 1’ management to deliberate, reasoning-based ‘System 2’ processes.
  • The future of autonomous driving relies on VLA models that can reason and act like human drivers.
  • AI must evolve from being an information tool to a production tool capable of executing actions.
  • Companies need to adapt their organizational structures to effectively integrate and manage AI agents.
  • World models are essential for safely and efficiently training AI systems for physical world interactions.
  • AI must evolve from providing information to taking action (becoming production tools) to realize its true commercial value.
  • Open-source models like DeepSeek are democratizing AI capabilities and significantly reducing R&D costs for companies like Li Auto.
  • The future of smart devices relies on an ‘Agent OS’ that can seamlessly integrate and manage multiple AI agents.
  • VLA (Vision-Language-Action) models represent a paradigm shift in autonomous driving, moving away from modular systems to end-to-end AI.
  • Despite AI advancements, human empathy, connection, and strategic decision-making remain irreplaceable.
  • The AI industry must transition from building information tools to creating production tools that can execute actions (VLA).
  • Open-source models are significantly accelerating the R&D cycles for companies building applied AI, such as autonomous driving systems.
  • Organizations need to restructure around AI capabilities, moving from traditional hierarchies to models that support AI-driven workflows.
  • While AI will surpass humans in data processing and intelligence, human wisdom—centered on relationships, values, and entropy reduction—will remain irreplaceable.