GALAXEA Founder Gao Jiyang: Catfish Effect, Waymo vs Momenta

Duration: 184 min · ▶ Watch on YouTube

Guest: Gao Jiyang · AI Researcher, Former Waymo Engineer, Co-founder of Momenta

Switch → 中文

Chapters (24)

  • 02:10 · Chapter 1: 冲刺型小孩
    • Gao discusses his early education, his strategic approach to studying, and how he got into Tsinghua University through the Physics Olympiad.
  • 12:00 · Chapter 2: 学习曾国藩
    • Gao talks about his transition to AI, his internship at SenseTime, and drawing leadership inspiration from historical figure Zeng Guofan.
  • 25:30 · Chapter 3: 提高顶会命中概率
    • Gao explains his systematic strategy for publishing papers at top AI conferences during his PhD and evaluating industry opportunities.
  • 33:40 · Chapter 4: Waymo是没有创始人的
    • Gao reflects on his time at Waymo, the shift from robotics-based to AI-native autonomous driving architectures, and Waymo’s organizational challenges.
  • 50:00 · Working at Waymo and the Decision to Leave
    • Jida discusses his time at Waymo, learning engineering systems, and why he decided to leave for a more product-oriented environment.
  • 52:20 · Chapter 5: Momenta is the Opposite of Extreme
    • Jida explains his choice to join Momenta over Huawei, highlighting Momenta’s result-oriented culture and mass production strategy.
  • 62:20 · Chapter 6: The Catfish Effect
    • Jida describes his role in driving Momenta’s mass production delivery for SAIC and the intense, fast-paced work culture.
  • 69:20 · Chapter 7: Starting with a Terrible BP
    • Jida recounts his decision to start an embodied AI company at age 30, the initial struggles with fundraising, and securing angel investments.
  • 75:20 · Chapter 8: Struggling with Hardware and Supply Chain
    • Jida explains why an embodied AI company must build its own hardware and the challenges of navigating the supply chain from scratch.
  • 100:00 · Hardware and Supply Chain Focus
    • Discussion on the company’s early focus on hardware, supply chain, and the decision to build a wheeled robot with a torso.
  • 103:00 · Targeting the Developer Market
    • Explaining the strategy to target the developer market first, categorizing developers into academic, enterprise, and productivity tiers.
  • 106:30 · AI vs. Hardware Engineering
    • Contrasting the talent-density focus of AI software with the rigorous process-driven requirements of hardware engineering.
  • 109:30 · Data Recipe and Intelligence Strategy
    • Outlining the shift towards data and intelligence, emphasizing the importance of end-to-end models and real-world data.
  • 117:00 · The Cost of Real Robot Data
    • Breaking down the financial and time costs of acquiring real robot data versus simulated data.
  • 126:00 · Robot Brain Architecture
    • Detailing the dual-system architecture using VLA models for fast actions and VLMs for slow reasoning.
  • 130:00 · Finding the Right Scenarios
    • Identifying ideal commercial scenarios for embodied AI, such as bin picking and flexible assembly.
  • 140:00 · Startups vs. Big Tech
    • Analyzing the competitive landscape between agile startups and resource-rich big tech companies in the robotics space.
  • 150:00 · Company Culture and Pragmatic Innovation
    • Discussing the importance of creating real customer value through pragmatic innovation.
  • 151:23 · Hua Zhe’s Departure
    • Explaining co-founder Hua Zhe’s departure to pursue 2C applications and the company’s support for him.
  • 154:00 · The Embodied AI Value Chain
    • Analyzing the transmission cycles of algorithms versus hardware and supply chains.
  • 159:00 · Technical Vision
    • Envisioning robots that learn like human employees through demonstration and self-practice.
  • 164:30 · Funding and Valuation
    • Detailing the recent funding round and the company’s 30x valuation growth.
  • 170:09 · Learning from Peers
    • Sharing respect and learnings from peer companies like Unitree, Physical Intelligence, and Zhiyuan.
  • 180:00 · Personal Preferences
    • The guest shares his favorite food, movies, music, and books.

Specific Numbers (23)

Time Fact Value Context
01:55 Gao’s birth year 1992 The host mentions he looks older but is actually born in 1992.
06:23 National Physics Olympiad November 2010 Gao participated in the national competition in Xiamen.
12:34 Started AI internship Late 2014 / Early 2015 Gao began his internship at SenseTime and trained his first neural network.
34:02 Waymo’s inception era 2008/2009 Gao notes that Waymo’s autonomous driving efforts date back to the DARPA challenge era.
54:26 Joined Waymo January 2019 Jida joined Waymo early in 2019.
54:29 Decided to leave Waymo H2 2020 Jida felt he had learned enough about the AD system and wanted to be closer to product and business.
59:37 Momenta’s strategic shift 2018 Momenta explicitly decided to pursue mass production autonomous driving to build a data flywheel for Robotaxi.
62:31 Momenta secured SAIC project End of 2020 Momenta won the Zhiji (IM Motors) project from SAIC.
69:34 Decided to start a company End of 2022 Jida turned 30 and decided it was time to pursue his entrepreneurial ambitions.
70:39 Resigned from Momenta May 2023 Jida officially left Momenta to start his own company.
77:10 Angel round funding 30 million RMB The total amount raised in the first angel round from IDG, Baidu Ventures, and GSR Ventures.
78:00 Plus round funding 10-20 million RMB An additional funding round raised shortly after the angel round.
100:38 Second round of financing Early 2024 Completed the second round of financing to focus on hardware and supply chain.
102:38 Form factor decision March 2024 Decided on the wheeled + torso form factor for their first robot.
117:56 Data acquisition to training cost ratio 1:5 to 1:10 For every $1 spent on acquiring data, $5 to $10 is spent on training the model.
119:28 Cost per hour of real data 200-250 RMB Estimated cost to collect one hour of real robot teleoperation data.
120:07 Data scale for general-purpose AI 100,000 hours The amount of interaction data equivalent to an 18-year-old human’s life experience.
120:34 Cost for 100,000 hours of data 25 million RMB The estimated financial cost to collect 100,000 hours of real robot data.
154:53 Hardware and supply chain transmission cycle 12 to 18 months The time it takes for hardware innovations to be replicated.
155:59 Algorithm transmission cycle 2 to 3 months The time it takes for algorithm innovations to be replicated due to open source and papers.
160:39 Number of developer customers Over 150 The number of global developer customers using Xinghai Tu’s products.
166:05 Valuation growth 30x The company’s valuation grew 30 times compared to January 2024.
167:06 Company size Over 200 employees The current size of the organization.

Research Claims & Predictions (10)

  • [16:46] Neural networks can automatically extract rules from data, replacing manual programming.
    • evidence: Observed during his early AI experiments where models replaced complex if-else logic.
  • [36:08] End-to-end, data-driven AI architectures will replace modular, rule-based robotics architectures in autonomous driving.
    • evidence: Based on the performance plateau of traditional systems and the rapid scaling of data-driven models like Tesla’s.
  • [38:07] Representing maps and trajectories as vectors is more efficient than rendering them as images for prediction models.
    • evidence: Proven by the development and success of the VectorNet model at Waymo.
  • [61:15] Data closed loop is essential for autonomous driving.
    • evidence: To achieve Robotaxi, you need massive amounts of data, which can only be acquired by deploying mass-production AD systems in consumer cars.
  • [74:06] Embodied AI requires building the hardware.
    • evidence: To build a data closed loop in the physical world, an AI company must control the hardware (the robot) to gather data and execute actions effectively.
  • [101:20] Bipedal locomotion adds unnecessary complexity to manipulation tasks.
    • evidence: The ‘local manipulation’ problem (coordinating legs and arms) remains unsolved, making wheeled bases more practical for current AI capabilities.
  • [116:00] Real-world data is essential because the sim-to-real gap is still too large.
    • evidence: Traditional graphics-based simulation struggles to accurately model complex physical interactions, requiring real data for reliable performance.
  • [126:50] Robot brains will utilize a dual-system architecture.
    • evidence: Edge computing constraints prevent running massive reasoning models at high frequencies, necessitating a fast VLA model for actions and a slow VLM for reasoning.
  • [159:35] Robots will learn like human employees.
    • evidence: Through a few demonstrations and self-practice, robots will be able to autonomously complete tasks.
  • [154:25] Algorithm innovation cannot exist independently.
    • evidence: It must be part of a full value chain including hardware, data, and infrastructure.

Key Concepts (14)

  • [16:46] Deep Learning / Neural Networks
    • A machine learning method that automatically extracts patterns and rules from large datasets, replacing manual if-else programming.
  • [36:08] End-to-end Architecture
    • A system design where a single neural network maps raw inputs directly to final outputs, avoiding modular, human-designed intermediate steps.
  • [38:07] Vector Representation (VectorNet)
    • A method of representing HD maps and agent trajectories as mathematical vectors rather than rendered images, improving computational efficiency and performance.
  • [52:58] Engineering Mindset (工程师思维)
    • Breaking down complex problems into smaller, measurable sub-problems, writing code, and testing layer by layer.
  • [61:15] Data Flywheel (数据飞轮)
    • Deploying mass-production systems to gather real-world data, which trains better AI models, which in turn improves the product.
  • [74:06] Embodied AI (具身智能)
    • AI systems that interact with the physical world through hardware (robots) to create a data closed loop.
  • [81:45] Customer-Centric (以客户为中心)
    • A core value at Momenta, meaning to deeply understand and solve the customer’s actual problems rather than just following rigid instructions.
  • [101:33] Intelligence defines the body (智能定义本体)
    • Designing robot hardware based on the current capabilities and limitations of AI algorithms, rather than building hardware first.
  • [103:00] Developer Market (开发者市场)
    • The initial target market for robots, focusing on researchers, engineers, and integrators before reaching end consumers.
  • [116:00] Sim-to-Real Gap
    • The discrepancy between simulated environments and the real world, which makes transferring learned skills difficult.
  • [124:09] Data Recipe
    • The strategic mix and proportion of different types of data (real, simulated, human-centric) used to train embodied AI models.
  • [150:14] Pragmatic Innovation (务实创新)
    • Focusing on creating real value for customers and calculating ROI, rather than just pursuing romantic or theoretical research.
  • [154:25] Value Chain (价值链条)
    • The full stack required for embodied AI, including hardware, supply chain, data, AI infra, and algorithms.
  • [154:50] Transmission Cycle (传播周期)
    • The time it takes for an innovation to be replicated or adopted by others in the industry.

People Mentioned (23)

  • Xiao Jun — The host of the interview.
  • Gao Jiyang — The guest, an AI researcher and entrepreneur.
  • Zeng Guofan — A historical figure Gao studied to learn about leadership, resource mobilization, and achieving goals despite setbacks.
  • Tang Xiao’ou — Professor who provided Gao with the internship opportunity at SenseTime.
  • Cao Xudong — Gao’s mentor at SenseTime who guided his early AI work.
  • Sun Chen — A senior from Tsinghua who helped Gao secure a research position at USC.
  • Zhao Hang — Gao’s collaborator at Waymo on the VectorNet paper.
  • Elon Musk — Mentioned in relation to Tesla’s top-down, AI-driven autonomous driving strategy.
  • Chen Yilun — Huawei executive Jida spoke with.
  • Su Qing — Huawei executive Jida spoke with.
  • Sun Gang — Executive at Momenta.
  • Ren Shaoqing — Former Momenta executive whose departure prompted organizational changes.
  • Tianwei — Jida’s co-founder.
  • Yang Zeyi — Jida’s co-founder and hardware/mechanical engineering expert.
  • Fei-Fei Li (李飞飞) — Mentioned as an example of a top-tier academic developer in the AI space.
  • Zhao Hang (赵航) — Leading the data and intelligence team at Xinhaitu.
  • Wang He (王鹤) — Mentioned in the context of calculating the hourly cost of robot data collection.
  • Xu Huazhe (许华哲) — Mentioned in a chapter title regarding his departure.
  • Hua Zhe (华哲) — Co-founder who left Xinghai Tu to pursue 2C applications.
  • Tian Fei (天飞) — Co-founder of Xinghai Tu.
  • Zhao Hang (赵行) — Co-founder of Xinghai Tu, now leading the foundation model team.
  • Cao Xudong (曹旭东) — CEO of Momenta.
  • Shao Qing (少卿) — Former researcher at Momenta.

Companies Mentioned (19)

SenseTime · Waymo · Tesla · Google · Momenta · Huawei · Pony.ai · WeRide · SAIC (上汽) · IDG · Baidu Ventures · GSR Ventures (金沙江) · Xinhaitu (星海图) · Ant Group (蚂蚁) · Apple · Xinghai Tu (星海图) · Unitree (宇树) · Physical Intelligence (PI) · Zhiyuan (智元)

Notable Quotes (10)

I felt that neural networks could replace humans in discovering rules in data. This is too awesome. I have to do this. — Gao Jiyang @ 17:38

The magic of AI lies in its ability to replace humans in summarizing rules. — Gao Jiyang @ 33:18

Waymo has no founder… the top-down force is missing. — Gao Jiyang @ 42:02

我觉得一个组织要成功,必须要有容错,但是得有一个人说我们错了,然后我们改。 — Jida @ 58:27

我们要做具身智能,必须得是整机加智能,不能只做智能。 — Jida @ 74:06

我比较喜欢面对真实,哪怕这个真相和真实是残酷的,我也要去面对它。 — Jida @ 76:05

智能定义本体。从智能的需求出发去看本体应该怎么做。 — Peng Siyuan @ 101:33

10万小时的数据意味着什么?其实一个人从生下来到18岁,和物理世界交互的总时长就是这个量级。 — Peng Siyuan @ 120:00

机器人这个事本身就不浪漫… 链条非常长,周期也很长。 — Gao Jiyang @ 161:16

理想主义不能变成空想。 — Gao Jiyang @ 166:34

Career Arc & Personal Stories (12)

  • [02:32] Gao describes himself as a normal student until 6th grade, when he started ‘sprinting’ for exams, eventually winning a national physics competition and entering Tsinghua.
  • [12:34] During his junior year, feeling lost about his major, he interned at SenseTime, trained his first neural network, and found his passion for AI.
  • [25:30] To graduate quickly and enter the industry, Gao systematically categorized AI research papers into three types to maximize his chances of publishing at top conferences.
  • [34:14] At Waymo, he analyzed the legacy codebase, realized the limitations of rendering maps as images, and co-developed VectorNet to solve the problem.
  • [50:00] Jida worked at Waymo, where he learned how a massive autonomous driving system is built and developed an engineering mindset.
  • [54:20] Feeling disconnected from the product and business side at Waymo, Jida decided to return to China to join a company closer to commercialization.
  • [55:40] He joined Momenta, where he thrived in a high-pressure, result-oriented environment, eventually leading the mass production delivery for a major OEM.
  • [69:20] Upon turning 30, Jida resigned from Momenta to start his own embodied AI company, navigating the challenges of fundraising with a poor initial business plan.
  • [75:20] Realizing that software alone wasn’t enough, Jida and his team had to learn hardware and supply chain management from scratch, eventually bringing on a hardware co-founder.
  • [106:48] The guest reflects on transitioning from pure AI software to embodied AI, learning that while AI relies on ‘10x engineers’, hardware requires rigorous, process-driven engineering (EVT, DVT, PVT) to avoid physical failures.
  • [153:02] Founded the company with Tian Fei and Zhao Hang, and gradually brought in more partners to build a strong team.
  • [180:00] Shares personal preferences, including a love for movies developed during college, and a taste for sci-fi, suspense, and history books.

Tools & Models Discussed (7)

  • VectorNet: A graph neural network model that represents HD maps and agent trajectories as vectors, significantly improving the efficiency and accuracy of trajectory prediction in autonomous driving.
  • Transformer / Self-attention: A neural network architecture mechanism used within VectorNet to process and relate different vector inputs effectively.
  • GPT-1 / GPT-2 / GPT-3 / InstructGPT / ChatGPT: OpenAI’s language models that restored global confidence in AI’s potential, acting as a catalyst for Jida’s decision to start an AI company.
  • R1: Xinhaitu’s first-generation robot product, featuring a wheeled base and a torso for manipulation.
  • VLA (Vision-Language-Action) Model: A model that directly translates visual and text inputs into physical robot actions, used for fast, reactive control.
  • VLM (Vision-Language Model): A higher-level model used for slow, logical reasoning and task breakdown in the robot’s dual-system architecture.
  • Foundation Models for Robotics: Enables robots to learn tasks through demonstration and self-practice, similar to human employees.

Topics

Early Education and Competition · Transition to Artificial Intelligence · Academic Research Strategy · Autonomous Driving Architectures · Waymo vs. Tesla Approaches · Organizational Structure in Tech Companies · Autonomous Driving · Embodied AI · Startup Fundraising · Hardware Supply Chain · Engineering Culture · Data Closed Loop · Embodied AI · Robot Hardware Design · Data Collection and Scaling · Sim-to-Real Gap · Developer Market Strategy · Dual-System AI Architecture · Startups vs. Big Tech in Robotics · Embodied AI · Robotics Supply Chain · Startup Culture · Pragmatic Innovation · Algorithm vs Hardware Cycles

Takeaways

  • Strategic ‘sprinting’ and goal-oriented planning can accelerate academic and career progress.
  • AI’s core value is replacing manual rule-creation with data-driven pattern extraction.
  • End-to-end AI architectures are replacing traditional modular robotics systems in autonomous driving.
  • A strong, founder-led top-down vision is crucial for companies navigating major technological paradigm shifts.
  • A successful AI organization must be result-oriented and willing to admit and correct mistakes quickly.
  • To achieve true autonomy (Robotaxi or Robotics), companies must deploy mass-production systems to gather real-world data and build a data flywheel.
  • In embodied AI, software alone is insufficient; companies must build and control the hardware (the robot) to effectively close the data loop.
  • Transitioning from a software engineer to a hardware founder requires stepping out of one’s comfort zone and learning supply chain management from the ground up.
  • Designing robot hardware should be dictated by current AI capabilities (‘Intelligence defines the body’).
  • Real-world data is expensive but necessary due to the limitations of simulation.
  • 100,000 hours of real-world interaction data is the estimated threshold for achieving general-purpose robot intelligence.
  • Robotics requires a blend of high-talent-density AI engineering and rigorous, process-driven hardware manufacturing.
  • Startups must find specific, high-tolerance commercial scenarios to survive against resource-rich big tech companies.
  • In embodied AI, algorithm innovation is fast to replicate (2-3 months), while hardware and supply chain advantages take much longer (12-18 months) to build and defend.
  • Pragmatic innovation and calculating ROI are crucial for survival and success in the robotics industry, which is inherently ‘unromantic’ due to its long and complex value chain.
  • A successful AI robotics company must integrate the entire value chain: hardware, supply chain, data, AI infrastructure, and algorithms.