GALAXEA Founder Gao Jiyang: Catfish Effect, Waymo vs Momenta
Duration: 184 min · ▶ Watch on YouTube
Guest: Gao Jiyang · AI Researcher, Former Waymo Engineer, Co-founder of Momenta
Chapters (24)
- 02:10 · Chapter 1: 冲刺型小孩
- Gao discusses his early education, his strategic approach to studying, and how he got into Tsinghua University through the Physics Olympiad.
- 12:00 · Chapter 2: 学习曾国藩
- Gao talks about his transition to AI, his internship at SenseTime, and drawing leadership inspiration from historical figure Zeng Guofan.
- 25:30 · Chapter 3: 提高顶会命中概率
- Gao explains his systematic strategy for publishing papers at top AI conferences during his PhD and evaluating industry opportunities.
- 33:40 · Chapter 4: Waymo是没有创始人的
- Gao reflects on his time at Waymo, the shift from robotics-based to AI-native autonomous driving architectures, and Waymo’s organizational challenges.
- 50:00 · Working at Waymo and the Decision to Leave
- Jida discusses his time at Waymo, learning engineering systems, and why he decided to leave for a more product-oriented environment.
- 52:20 · Chapter 5: Momenta is the Opposite of Extreme
- Jida explains his choice to join Momenta over Huawei, highlighting Momenta’s result-oriented culture and mass production strategy.
- 62:20 · Chapter 6: The Catfish Effect
- Jida describes his role in driving Momenta’s mass production delivery for SAIC and the intense, fast-paced work culture.
- 69:20 · Chapter 7: Starting with a Terrible BP
- Jida recounts his decision to start an embodied AI company at age 30, the initial struggles with fundraising, and securing angel investments.
- 75:20 · Chapter 8: Struggling with Hardware and Supply Chain
- Jida explains why an embodied AI company must build its own hardware and the challenges of navigating the supply chain from scratch.
- 100:00 · Hardware and Supply Chain Focus
- Discussion on the company’s early focus on hardware, supply chain, and the decision to build a wheeled robot with a torso.
- 103:00 · Targeting the Developer Market
- Explaining the strategy to target the developer market first, categorizing developers into academic, enterprise, and productivity tiers.
- 106:30 · AI vs. Hardware Engineering
- Contrasting the talent-density focus of AI software with the rigorous process-driven requirements of hardware engineering.
- 109:30 · Data Recipe and Intelligence Strategy
- Outlining the shift towards data and intelligence, emphasizing the importance of end-to-end models and real-world data.
- 117:00 · The Cost of Real Robot Data
- Breaking down the financial and time costs of acquiring real robot data versus simulated data.
- 126:00 · Robot Brain Architecture
- Detailing the dual-system architecture using VLA models for fast actions and VLMs for slow reasoning.
- 130:00 · Finding the Right Scenarios
- Identifying ideal commercial scenarios for embodied AI, such as bin picking and flexible assembly.
- 140:00 · Startups vs. Big Tech
- Analyzing the competitive landscape between agile startups and resource-rich big tech companies in the robotics space.
- 150:00 · Company Culture and Pragmatic Innovation
- Discussing the importance of creating real customer value through pragmatic innovation.
- 151:23 · Hua Zhe’s Departure
- Explaining co-founder Hua Zhe’s departure to pursue 2C applications and the company’s support for him.
- 154:00 · The Embodied AI Value Chain
- Analyzing the transmission cycles of algorithms versus hardware and supply chains.
- 159:00 · Technical Vision
- Envisioning robots that learn like human employees through demonstration and self-practice.
- 164:30 · Funding and Valuation
- Detailing the recent funding round and the company’s 30x valuation growth.
- 170:09 · Learning from Peers
- Sharing respect and learnings from peer companies like Unitree, Physical Intelligence, and Zhiyuan.
- 180:00 · Personal Preferences
- The guest shares his favorite food, movies, music, and books.
Specific Numbers (23)
| Time | Fact | Value | Context |
|---|---|---|---|
| 01:55 | Gao’s birth year | 1992 | The host mentions he looks older but is actually born in 1992. |
| 06:23 | National Physics Olympiad | November 2010 | Gao participated in the national competition in Xiamen. |
| 12:34 | Started AI internship | Late 2014 / Early 2015 | Gao began his internship at SenseTime and trained his first neural network. |
| 34:02 | Waymo’s inception era | 2008/2009 | Gao notes that Waymo’s autonomous driving efforts date back to the DARPA challenge era. |
| 54:26 | Joined Waymo | January 2019 | Jida joined Waymo early in 2019. |
| 54:29 | Decided to leave Waymo | H2 2020 | Jida felt he had learned enough about the AD system and wanted to be closer to product and business. |
| 59:37 | Momenta’s strategic shift | 2018 | Momenta explicitly decided to pursue mass production autonomous driving to build a data flywheel for Robotaxi. |
| 62:31 | Momenta secured SAIC project | End of 2020 | Momenta won the Zhiji (IM Motors) project from SAIC. |
| 69:34 | Decided to start a company | End of 2022 | Jida turned 30 and decided it was time to pursue his entrepreneurial ambitions. |
| 70:39 | Resigned from Momenta | May 2023 | Jida officially left Momenta to start his own company. |
| 77:10 | Angel round funding | 30 million RMB | The total amount raised in the first angel round from IDG, Baidu Ventures, and GSR Ventures. |
| 78:00 | Plus round funding | 10-20 million RMB | An additional funding round raised shortly after the angel round. |
| 100:38 | Second round of financing | Early 2024 | Completed the second round of financing to focus on hardware and supply chain. |
| 102:38 | Form factor decision | March 2024 | Decided on the wheeled + torso form factor for their first robot. |
| 117:56 | Data acquisition to training cost ratio | 1:5 to 1:10 | For every $1 spent on acquiring data, $5 to $10 is spent on training the model. |
| 119:28 | Cost per hour of real data | 200-250 RMB | Estimated cost to collect one hour of real robot teleoperation data. |
| 120:07 | Data scale for general-purpose AI | 100,000 hours | The amount of interaction data equivalent to an 18-year-old human’s life experience. |
| 120:34 | Cost for 100,000 hours of data | 25 million RMB | The estimated financial cost to collect 100,000 hours of real robot data. |
| 154:53 | Hardware and supply chain transmission cycle | 12 to 18 months | The time it takes for hardware innovations to be replicated. |
| 155:59 | Algorithm transmission cycle | 2 to 3 months | The time it takes for algorithm innovations to be replicated due to open source and papers. |
| 160:39 | Number of developer customers | Over 150 | The number of global developer customers using Xinghai Tu’s products. |
| 166:05 | Valuation growth | 30x | The company’s valuation grew 30 times compared to January 2024. |
| 167:06 | Company size | Over 200 employees | The current size of the organization. |
Research Claims & Predictions (10)
- [16:46] Neural networks can automatically extract rules from data, replacing manual programming.
- evidence: Observed during his early AI experiments where models replaced complex if-else logic.
- [36:08] End-to-end, data-driven AI architectures will replace modular, rule-based robotics architectures in autonomous driving.
- evidence: Based on the performance plateau of traditional systems and the rapid scaling of data-driven models like Tesla’s.
- [38:07] Representing maps and trajectories as vectors is more efficient than rendering them as images for prediction models.
- evidence: Proven by the development and success of the VectorNet model at Waymo.
- [61:15] Data closed loop is essential for autonomous driving.
- evidence: To achieve Robotaxi, you need massive amounts of data, which can only be acquired by deploying mass-production AD systems in consumer cars.
- [74:06] Embodied AI requires building the hardware.
- evidence: To build a data closed loop in the physical world, an AI company must control the hardware (the robot) to gather data and execute actions effectively.
- [101:20] Bipedal locomotion adds unnecessary complexity to manipulation tasks.
- evidence: The ‘local manipulation’ problem (coordinating legs and arms) remains unsolved, making wheeled bases more practical for current AI capabilities.
- [116:00] Real-world data is essential because the sim-to-real gap is still too large.
- evidence: Traditional graphics-based simulation struggles to accurately model complex physical interactions, requiring real data for reliable performance.
- [126:50] Robot brains will utilize a dual-system architecture.
- evidence: Edge computing constraints prevent running massive reasoning models at high frequencies, necessitating a fast VLA model for actions and a slow VLM for reasoning.
- [159:35] Robots will learn like human employees.
- evidence: Through a few demonstrations and self-practice, robots will be able to autonomously complete tasks.
- [154:25] Algorithm innovation cannot exist independently.
- evidence: It must be part of a full value chain including hardware, data, and infrastructure.
Key Concepts (14)
- [16:46] Deep Learning / Neural Networks
- A machine learning method that automatically extracts patterns and rules from large datasets, replacing manual if-else programming.
- [36:08] End-to-end Architecture
- A system design where a single neural network maps raw inputs directly to final outputs, avoiding modular, human-designed intermediate steps.
- [38:07] Vector Representation (VectorNet)
- A method of representing HD maps and agent trajectories as mathematical vectors rather than rendered images, improving computational efficiency and performance.
- [52:58] Engineering Mindset (工程师思维)
- Breaking down complex problems into smaller, measurable sub-problems, writing code, and testing layer by layer.
- [61:15] Data Flywheel (数据飞轮)
- Deploying mass-production systems to gather real-world data, which trains better AI models, which in turn improves the product.
- [74:06] Embodied AI (具身智能)
- AI systems that interact with the physical world through hardware (robots) to create a data closed loop.
- [81:45] Customer-Centric (以客户为中心)
- A core value at Momenta, meaning to deeply understand and solve the customer’s actual problems rather than just following rigid instructions.
- [101:33] Intelligence defines the body (智能定义本体)
- Designing robot hardware based on the current capabilities and limitations of AI algorithms, rather than building hardware first.
- [103:00] Developer Market (开发者市场)
- The initial target market for robots, focusing on researchers, engineers, and integrators before reaching end consumers.
- [116:00] Sim-to-Real Gap
- The discrepancy between simulated environments and the real world, which makes transferring learned skills difficult.
- [124:09] Data Recipe
- The strategic mix and proportion of different types of data (real, simulated, human-centric) used to train embodied AI models.
- [150:14] Pragmatic Innovation (务实创新)
- Focusing on creating real value for customers and calculating ROI, rather than just pursuing romantic or theoretical research.
- [154:25] Value Chain (价值链条)
- The full stack required for embodied AI, including hardware, supply chain, data, AI infra, and algorithms.
- [154:50] Transmission Cycle (传播周期)
- The time it takes for an innovation to be replicated or adopted by others in the industry.
People Mentioned (23)
- Xiao Jun — The host of the interview.
- Gao Jiyang — The guest, an AI researcher and entrepreneur.
- Zeng Guofan — A historical figure Gao studied to learn about leadership, resource mobilization, and achieving goals despite setbacks.
- Tang Xiao’ou — Professor who provided Gao with the internship opportunity at SenseTime.
- Cao Xudong — Gao’s mentor at SenseTime who guided his early AI work.
- Sun Chen — A senior from Tsinghua who helped Gao secure a research position at USC.
- Zhao Hang — Gao’s collaborator at Waymo on the VectorNet paper.
- Elon Musk — Mentioned in relation to Tesla’s top-down, AI-driven autonomous driving strategy.
- Chen Yilun — Huawei executive Jida spoke with.
- Su Qing — Huawei executive Jida spoke with.
- Sun Gang — Executive at Momenta.
- Ren Shaoqing — Former Momenta executive whose departure prompted organizational changes.
- Tianwei — Jida’s co-founder.
- Yang Zeyi — Jida’s co-founder and hardware/mechanical engineering expert.
- Fei-Fei Li (李飞飞) — Mentioned as an example of a top-tier academic developer in the AI space.
- Zhao Hang (赵航) — Leading the data and intelligence team at Xinhaitu.
- Wang He (王鹤) — Mentioned in the context of calculating the hourly cost of robot data collection.
- Xu Huazhe (许华哲) — Mentioned in a chapter title regarding his departure.
- Hua Zhe (华哲) — Co-founder who left Xinghai Tu to pursue 2C applications.
- Tian Fei (天飞) — Co-founder of Xinghai Tu.
- Zhao Hang (赵行) — Co-founder of Xinghai Tu, now leading the foundation model team.
- Cao Xudong (曹旭东) — CEO of Momenta.
- Shao Qing (少卿) — Former researcher at Momenta.
Companies Mentioned (19)
SenseTime · Waymo · Tesla · Google · Momenta · Huawei · Pony.ai · WeRide · SAIC (上汽) · IDG · Baidu Ventures · GSR Ventures (金沙江) · Xinhaitu (星海图) · Ant Group (蚂蚁) · Apple · Xinghai Tu (星海图) · Unitree (宇树) · Physical Intelligence (PI) · Zhiyuan (智元)
Notable Quotes (10)
I felt that neural networks could replace humans in discovering rules in data. This is too awesome. I have to do this. — Gao Jiyang @ 17:38
The magic of AI lies in its ability to replace humans in summarizing rules. — Gao Jiyang @ 33:18
Waymo has no founder… the top-down force is missing. — Gao Jiyang @ 42:02
我觉得一个组织要成功,必须要有容错,但是得有一个人说我们错了,然后我们改。 — Jida @ 58:27
我们要做具身智能,必须得是整机加智能,不能只做智能。 — Jida @ 74:06
我比较喜欢面对真实,哪怕这个真相和真实是残酷的,我也要去面对它。 — Jida @ 76:05
智能定义本体。从智能的需求出发去看本体应该怎么做。 — Peng Siyuan @ 101:33
10万小时的数据意味着什么?其实一个人从生下来到18岁,和物理世界交互的总时长就是这个量级。 — Peng Siyuan @ 120:00
机器人这个事本身就不浪漫… 链条非常长,周期也很长。 — Gao Jiyang @ 161:16
理想主义不能变成空想。 — Gao Jiyang @ 166:34
Career Arc & Personal Stories (12)
- [02:32] Gao describes himself as a normal student until 6th grade, when he started ‘sprinting’ for exams, eventually winning a national physics competition and entering Tsinghua.
- [12:34] During his junior year, feeling lost about his major, he interned at SenseTime, trained his first neural network, and found his passion for AI.
- [25:30] To graduate quickly and enter the industry, Gao systematically categorized AI research papers into three types to maximize his chances of publishing at top conferences.
- [34:14] At Waymo, he analyzed the legacy codebase, realized the limitations of rendering maps as images, and co-developed VectorNet to solve the problem.
- [50:00] Jida worked at Waymo, where he learned how a massive autonomous driving system is built and developed an engineering mindset.
- [54:20] Feeling disconnected from the product and business side at Waymo, Jida decided to return to China to join a company closer to commercialization.
- [55:40] He joined Momenta, where he thrived in a high-pressure, result-oriented environment, eventually leading the mass production delivery for a major OEM.
- [69:20] Upon turning 30, Jida resigned from Momenta to start his own embodied AI company, navigating the challenges of fundraising with a poor initial business plan.
- [75:20] Realizing that software alone wasn’t enough, Jida and his team had to learn hardware and supply chain management from scratch, eventually bringing on a hardware co-founder.
- [106:48] The guest reflects on transitioning from pure AI software to embodied AI, learning that while AI relies on ‘10x engineers’, hardware requires rigorous, process-driven engineering (EVT, DVT, PVT) to avoid physical failures.
- [153:02] Founded the company with Tian Fei and Zhao Hang, and gradually brought in more partners to build a strong team.
- [180:00] Shares personal preferences, including a love for movies developed during college, and a taste for sci-fi, suspense, and history books.
Tools & Models Discussed (7)
- VectorNet: A graph neural network model that represents HD maps and agent trajectories as vectors, significantly improving the efficiency and accuracy of trajectory prediction in autonomous driving.
- Transformer / Self-attention: A neural network architecture mechanism used within VectorNet to process and relate different vector inputs effectively.
- GPT-1 / GPT-2 / GPT-3 / InstructGPT / ChatGPT: OpenAI’s language models that restored global confidence in AI’s potential, acting as a catalyst for Jida’s decision to start an AI company.
- R1: Xinhaitu’s first-generation robot product, featuring a wheeled base and a torso for manipulation.
- VLA (Vision-Language-Action) Model: A model that directly translates visual and text inputs into physical robot actions, used for fast, reactive control.
- VLM (Vision-Language Model): A higher-level model used for slow, logical reasoning and task breakdown in the robot’s dual-system architecture.
- Foundation Models for Robotics: Enables robots to learn tasks through demonstration and self-practice, similar to human employees.
Topics
Early Education and Competition · Transition to Artificial Intelligence · Academic Research Strategy · Autonomous Driving Architectures · Waymo vs. Tesla Approaches · Organizational Structure in Tech Companies · Autonomous Driving · Embodied AI · Startup Fundraising · Hardware Supply Chain · Engineering Culture · Data Closed Loop · Embodied AI · Robot Hardware Design · Data Collection and Scaling · Sim-to-Real Gap · Developer Market Strategy · Dual-System AI Architecture · Startups vs. Big Tech in Robotics · Embodied AI · Robotics Supply Chain · Startup Culture · Pragmatic Innovation · Algorithm vs Hardware Cycles
Takeaways
- Strategic ‘sprinting’ and goal-oriented planning can accelerate academic and career progress.
- AI’s core value is replacing manual rule-creation with data-driven pattern extraction.
- End-to-end AI architectures are replacing traditional modular robotics systems in autonomous driving.
- A strong, founder-led top-down vision is crucial for companies navigating major technological paradigm shifts.
- A successful AI organization must be result-oriented and willing to admit and correct mistakes quickly.
- To achieve true autonomy (Robotaxi or Robotics), companies must deploy mass-production systems to gather real-world data and build a data flywheel.
- In embodied AI, software alone is insufficient; companies must build and control the hardware (the robot) to effectively close the data loop.
- Transitioning from a software engineer to a hardware founder requires stepping out of one’s comfort zone and learning supply chain management from the ground up.
- Designing robot hardware should be dictated by current AI capabilities (‘Intelligence defines the body’).
- Real-world data is expensive but necessary due to the limitations of simulation.
- 100,000 hours of real-world interaction data is the estimated threshold for achieving general-purpose robot intelligence.
- Robotics requires a blend of high-talent-density AI engineering and rigorous, process-driven hardware manufacturing.
- Startups must find specific, high-tolerance commercial scenarios to survive against resource-rich big tech companies.
- In embodied AI, algorithm innovation is fast to replicate (2-3 months), while hardware and supply chain advantages take much longer (12-18 months) to build and defend.
- Pragmatic innovation and calculating ROI are crucial for survival and success in the robotics industry, which is inherently ‘unromantic’ due to its long and complex value chain.
- A successful AI robotics company must integrate the entire value chain: hardware, supply chain, data, AI infrastructure, and algorithms.