Episode 140 — 姚顺宇

主持人: 晓军 · 时长: 230 min · ▶ 在 YouTube 观看

对姚顺宇的4小时访谈：请允许我小疯一下！在Anthropic和Gemini训模型、技术预测、英雄主义已过去

Switch language → en

章节 (40)

00:00:00 · 嘉宾自我介绍与“两个姚顺宇”的区分
- 嘉宾姚顺宇介绍了自己的物理学背景和转行AI的经历，并澄清了自己与另一位同名AI从业者的区别，主要在于专业背景（物理学 vs 计算机科学）。
00:06:50 · AI发展阶段的转变：从“能否做到”到“要做什么”
- 嘉宾认为AI发展已进入新阶段，大家不再担心AI能否实现某事，而是更关注如何良好定义问题和确定AI的应用方向，这需要更多的人类洞察力。
00:08:10 · 模型能力的同质化与差异化
- 嘉宾指出，虽然AI模型在公开基准测试上表现趋同，但用户实际体验仍能感受到差异，不同模型在工具使用、编码和推理等特定领域各有优势。
00:12:28 · OpenCloud现象与产品形态的演变
- 嘉宾认为OpenCloud的出现并非技术上的惊人突破，而是模型能力自然溢出的结果，它展示了一种可能性，并促使业界思考如何将模型能力产品化。
00:15:35 · AI创业公司的生存之道与“数据飞轮”
- 嘉宾探讨了AI创业公司（如Minus和OpenCloud）被收购的原因，认为长期生存需要壁垒，目前主要在模型层面，未来产品层面也可能出现。他提到AI原生应用场景中，除了代码生成，尚未出现真正形成“数据飞轮”的成功案例。
00:20:56 · 模型学习能力的提升与预训练的持续进步
- 嘉宾认为模型能力进步速度并未放缓，反而学习能力越来越强。他指出，预训练的Scaling Law尚未触顶，未来几个月仍将有显著进展，这将解锁更多应用场景，例如实现梦想中的个人助手。
00:26:37 · AI发展的驱动力：数据、算力与算法
- 嘉宾分析了驱动AI模型能力提升的关键因素。他认为在当前清晰的预训练和后训练框架下，数据和算力是主要驱动力，两者相互关联。算法的作用则在于突破瓶颈，实现从“不能做”到“能做”的转变。
00:36:23 · 代码生成领域的高速发展及其优势
- 嘉宾指出，代码生成是AI发展最快的领域之一，其优势在于反馈信号清晰且数据基础（GitHub）天然优质，这使得模型能够高效学习和迭代。
00:38:26 · AI对编程工作效率的影响
- 嘉宾分享了AI在编程领域的应用，估计90%甚至更多代码由模型生成，大大简化了编程产品。AI还能显著加速理解和处理复杂代码，将实验效率提升20-50倍。
00:41:13 · AI对工作模式和时间的影响
- 尽管AI提高了工作效率，但嘉宾发现自己的工作时间反而变长了，因为开发速度加快，可以尝试更多想法。AI也让工作强度更高，Google在AI领域已不再是“养老院”。
00:42:53 · AI在其他领域的影响和挑战
- AI已开始影响基础科学研究（如数学、物理），加速推导和实验。AI擅长处理逻辑清晰、有客观评价标准的任务，但难以胜任产品经理等缺乏明确标准的领域。
00:46:20 · AI对程序员职业未来的影响和建议
- 嘉宾认为AI将渐进式地取代部分程序员工作，未来少数顶尖程序员将掌握更多价值。他建议程序员拥抱新技术，学习与AI高效协作，并培养技术实力、组织理解和规划能力。
00:49:50 · 对Gemini和中国AI发展的看法
- 嘉宾认为Gemini的发布更多是技术执行层面的优秀，而非范式转变，但给多模态团队带来压力。他指出中美AI差距在缩小，中国在算力劣势下催生了蒸馏等技术创新。
01:03:16 · 对机器人和个人成长经历的看法
- 嘉宾认为机器人AI仍处于早期阶段，尚未实现通用化扩展。他分享了自己从物理学转入AI的经历，强调敢于尝试和争取机会的重要性，并认为机器人实验室的工作非常有趣。
01:17:44 · 争取进入清华：一次改变命运的短信
- 嘉宾回忆了高中时通过物理竞赛自主招生进入清华大学的经历。尽管政策临时改变,他通过主动给招生办老师发短信争取,最终获得了考试资格并被录取,他认为这体现了清华愿意给学生提供平等机会的精神。
01:20:05 · 性格与家庭：叛逆、好胜与父母的「无为而治」
- 嘉宾谈及自己的性格,认为自己很有主见,一旦认准了事情就会尽力做到最好,同时有好胜心,但主要是跟自己较劲。他形容父母的教育方式是「无为而治」,因为管不住他,所以选择放手,让他自己做决定。
01:21:55 · 本科研究的「阴差阳错」：进入凝聚态理论领域
- 嘉宾讲述了本科期间如何「阴差阳错」地选择了凝聚态理论作为研究方向。在清华基科班的传统下,他很早就进入高等研究院,跟随王中老师开始做理论研究,并认为这是一个非常适合本科生上手的方向。
01:25:08 · 非厄米系统研究：一次范式突破
- 嘉宾详细介绍了他本科期间在开放量子系统(非厄米系统)方面的重要研究工作。他们发现传统用于描述厄米系统的布洛赫波理论失效了,并系统性地建立了一套新的理论方法来描述这类系统的行为,这项工作后来被证明是该领域的重要进展。
01:28:33 · 人性的弱点：为何放弃成功的研究方向
- 在本科取得重要研究成果后,嘉宾却选择在博士阶段转换方向。他将此归因于「人性的弱点」,即总想挑战自己不熟悉、更困难的事情,感觉在原有方向上最核心的工作已经完成,后续的探索不再那么激动人心。
01:38:34 · 博士的教训：高能理论与「伺候老灯」的困境
- 博士期间,嘉宾选择了极度困难且无法通过实验验证的高能理论。他反思这段经历虽然让自己成长,但对世界几乎没有贡献,并得出一个重要教训:要做有客观评价标准、能产生实际影响的事,而不是把时间浪费在「伺候老灯」(讨好领域内权威)上。
01:44:27 · 转向AI：在量子计算和人工智能之间的选择
- 博士后,嘉宾在量子计算和AI两个方向中选择了后者。他认为量子计算当时的瓶颈在于实验物理,而AI的研究范式(提出想法、用数值实验验证)与他擅长且喜欢的理论物理研究更相似,更像是18世纪理论与实验不分家的物理学黄金时代。
01:54:42 · 加入Anthropic：物理圈的人脉网络
- 嘉宾分享了加入Anthropic的过程,主要得益于人脉。Anthropic的创始团队中有不少理论物理背景的成员,通过前同事的推荐,他获得了面试机会。他认为这种跨界人脉的延续是早期AI公司的一个时代特征。
01:55:18 · 选择Anthropic: 拥抱强化学习的不确定性
- 嘉宾讲述了在OpenAI和Anthropic之间的面试选择。他被Anthropic正在探索的大规模强化学习所吸引,尽管当时自己对该领域认知甚少,但认为这是一个充满不确定性的好机会。为此,他通过学习Andrej Karpathy的课程等方式进行了准备。
01:57:11 · Anthropic早期印象: 执行力极强的Top-down小作坊
- 嘉宾回忆刚加入时,Anthropic还是一个700-800人的小公司,他所在的Horizon大组仅有10人左右。他对公司的第一印象是执行力极强,拥有独特的Top-down文化和开放的内部氛围,是一个绝佳的学习环境。
02:00:10 · 文化探源: 技术领袖即创始人,赋能高效决策
- 嘉宾深入分析Anthropic高效执行力的来源。他认为关键在于公司的技术领袖同时也是联合创始人(如Jared Kaplan),他们拥有决策权并能为之负责,从而实现了高效的自上而下(Top-down)的决策机制,这对于需要’make bets’的创业公司至关重要。
02:07:35 · AI研发的集体主义时代: 个体贡献与系统成就
- 在参与Claude 3.7项目后,嘉宾认为AI研发已进入集体主义时代,不再是个人英雄主义的舞台。他将自己的成就归功于在正确的时间加入了重要的项目,并强调AI的进步是整个组织和系统协同的结果,而非单个人的贡献。
02:16:16 · 范式瓶颈: 技术未到顶,应用想象力先行
- 嘉宾修正了自己早期’预训练已到尽头’的看法,认为无论是预训练还是后训练都远未达到技术平台期。他指出,当前的瓶颈更多在于我们对AI应用的想象力,局限于聊天机器人和代码助手等已知场景,而不知道下一步该教模型做什么。
02:20:22 · 为何离开: 个人理念、文化变迁与学习诉求
- 嘉宾阐述了离开Anthropic的三大原因: 一是不认同CEO的某些政治化表态; 二是公司快速扩张带来的文化稀释,不喜欢’说大话’的风气; 三是希望拓宽学习领域,探索Anthropic未涉足的多模态、底层工程等方向。
02:33:44 · Anthropic 的 AI 安全初心与幼稚的实现路径
- 嘉宾讨论了 Anthropic 以 AI 安全为初心的创立背景。他认为 Anthropic 最初希望通过打造最强模型来获得安全话语权的想法是’非常幼稚的’,并类比核武器的多方制衡,提出 AI 安全最终可能需要类似的去中心化机制来实现。
02:36:18 · AI 的本质: 可实验性与自我进化
- 嘉宾提出了一个个人观点: AI 的本质是简单的,其简单之处在于’能做实验’。与物理学等受限于实验条件的学科不同,AI 领域的任何想法都可以通过实验来验证,目前的瓶颈是’有太多的想法需要一个个去试’。
02:38:45 · 回顾 Anthropic: 商业模式的误判与产品创新的个体力量
- 嘉宾坦言离开 Anthropic 时对其商业模式感到悲观,认为纯 API 模式是’差生意’,最终会陷入价格战。但他承认自己’过度悲观’,Anthropic 后来通过巧妙的产品创新(如 Claude Code)稳住了阵脚,这其中 Boris 等人的’个人英雄主义’起到了关键作用。
02:42:25 · 在 DeepMind 的新征程: ML Coding 与 Long Horizon
- 嘉宾介绍了自己在 Google DeepMind 的两个主要研究方向: ‘ML Coding’旨在实现 AI 自我研究的闭环; ‘Long Horizon’则探索如何让模型’train with finite, but use as infinite’,即用有限上下文训练,却能处理无限长的任务。
02:47:48 · 竞争格局的逆转: Gemini 如何扳回一城
- 嘉宾分析了 Google 如何通过’Nano-Bard’营销吸引用户,再用性能强大的 Gemini 1.5 模型将用户留住,实现了’一拳’打回市场份额的战略。他认为这一组合拳让 Google 从被动转为主动,成为市场中举足轻重的玩家。
02:58:30 · 远未到来的终局: 对话框之外的 AI 新形态
- 嘉宾认为当前 AI 领域’谁的位置都不稳固’,远未到终局。他对国内All-in ‘Super App’的思路感到’费解’,并直言当前的 Chatbot 交互形态’很蠢’,未能完全释放模型的能力,期待有新的产品形态出现。
03:12:10 · World Models and Long Horizon
- 嘉宾讨论了解决’long horizon’问题的两种方法:微调模型权重和上下文管理。他认为两者本质上都是为了实现长时程任务,并对’世界模型’这一概念的模糊性提出了质疑,认为不同研究者对此有不同定义。
03:14:39 · Leadership and Culture in AI Labs
- 嘉宾谈及了Google和DeepMind的决策者,指出Sergey Brin是最终拍板人,而Koray Kavukcuoglu在执行层面更活跃。他认为一个好的技术领导者需要具备亲自解决问题的能力和容纳他人想法的胸怀。
03:16:01 · The Importance of Systems Thinking and Reliability
- 嘉宾强调,在当前的大模型时代,AI是系统工程。最重要的品质不是聪明,而是’靠谱’——做事细致、负责,并具备从全局思考问题的能力,而不是为了个人项目的数据好看而去’hack’指标。
03:22:32 · Hardware and Architectural Choices: TPU vs. GPU
- 嘉宾对比了TPU和GPU的架构设计理念。GPU通过NVLink在小集群内实现高速互联,而TPU则采用可大规模扩展的3D Torus拓扑。他认为在大规模商用场景下两者没有绝对优劣,关键在于配套的编译器和软件栈能否发挥硬件优势。
03:25:30 · Comparing US and Chinese AI Product Strategies
- 嘉宾分析了中美AI产品思路的差异。美国市场强于直接、清晰的效率和企业级软件,而中国则擅长打造复杂的、具有间接商业模式的C端产品,如抖音。他认为美国C端产品经理的能力不如中国同行,部分原因是美国市场赚钱’太容易’。
03:29:51 · Personal Philosophy and Career Advice
- 嘉宾认为AI领域的’个人英雄主义’时代已经过去,现在是’集体主义’。他建议年轻人不要扎堆最火热的语言模型领域,而是去探索多模态、机器人等更新的’蓝海’。他个人也表示不会在一家公司待很久,会继续寻找能’折磨’自己的挑战。

金句 (43)

00:02:27 — 姚顺宇:

我觉得这个行业就是最重要的特质就是靠谱。就是做事细，然后对自己做的事负责任，这是最重要的特质。
- 嘉宾强调了AI从业者最重要的品质是可靠和负责，这在快速变化的AI领域尤为重要。
00:07:03 — 姚顺宇:

对我来说确实现在AI进入了一个阶段就是我觉得大家都已经开始不那么担心一件事AI是不是能够做得到，而是担心这件是不是被良好定义。
- 嘉宾指出了AI发展阶段的根本性转变，从技术可行性转向问题定义和应用方向的思考。
00:07:57 — 姚顺宇:

我觉得现在对大家更难的事情是是想明白要做什么。
- 嘉宾总结了当前AI领域面临的核心挑战，即如何明确目标和应用场景，而非单纯追求技术突破。
00:22:15 — 姚顺宇:

我觉得对它来说最大的用处就是，如果不抛掉花了多少钱之外，它最大的用处是获得了一批很好的在亚洲的产品团队。
- 嘉宾分析了Meta收购Minus的战略价值，认为其主要目的是获取人才和团队，而非产品本身。
00:25:13 — 姚顺宇:

我觉得模型做到了train with finite context, use as infinite context。就是换句话说就是你用有限的这个context length去训练它，但是可以在使用的时候用非常非常长甚至接近无限的context length。
- 嘉宾阐述了AI模型在上下文处理能力上的关键技术突破，这将极大地扩展模型的应用范围。
00:27:51 — 姚顺宇:

我觉得模型学习能力越来越强了。以前可能让模型学会干一件事，需要动很多脑筋。但现在可能不需要动那么多脑筋了。最重要的事是你要把这个问题定义清楚，然后想清楚怎么去构建合适的数据。
- 嘉宾强调了模型学习能力的显著提升，使得AI开发的关键转向了问题定义和数据构建，而非复杂的模型调优。
00:35:04 — 姚顺宇:

人就是这样，就是当你没有撞到头的时候，你其实不知道这路有多长。我能我能看到的就是现在还没撞到头。我也不知道哪天会撞到头。
- 嘉宾用形象的比喻表达了AI技术发展路径的不可预测性，强调了当前阶段仍处于快速探索期。
00:36:40 — 姚顺宇:

Coding这个事，其实从Cloud 3.5 new，或者外界有人管它叫Cloud 3.6，从那个之后一直都处于高速发展的状态。
- 嘉宾指出代码生成是AI领域持续高速发展的应用场景，并给出了具体的时间点。
00:37:12 — 姚顺宇:

Coding的优势就是它的reward signal，就是它的那个反馈信号是很好定义的。
- 嘉宾解释了代码生成作为AI应用场景的独特优势，即其明确的反馈机制有助于模型高效学习。
00:38:07 — 姚顺宇:

Coding的数据有一个非常天然的基础，这个基础就是GitHub。
- 嘉宾指出了代码生成领域拥有高质量、大规模数据集的天然优势，这是其快速发展的重要原因。
00:45:51 — 姚顺宇 (Shunyu Yao):

我个人觉得，做产品经理，是我现在想不明白该怎么训练AI去做的事。
- 嘉宾指出AI目前难以替代的领域，即那些缺乏明确标准和客观评价的任务。
00:47:02 — 姚顺宇 (Shunyu Yao):

AI是一个很centralized的technology，它会让少部分人变得更强，但会让大部分人失去他们的独特价值。
- 嘉宾对AI技术可能带来的社会分化和职业冲击提出了深刻的洞察。
00:47:45 — 姚顺宇 (Shunyu Yao):

我觉得对程序员来说，最重要的事是怎么样和AI去有效地协作。
- 嘉宾强调了在AI时代，程序员需要转变角色，将与AI协作作为核心能力。
01:01:59 — 姚顺宇 (Shunyu Yao):

我个人觉得，它（Doubao）的语音生成可能是全世界最好的之一，我客气地说可能是全世界最好的。
- 嘉宾对中国AI模型豆包的语音生成能力给予了极高评价。
01:10:03 — 姚顺宇 (Shunyu Yao):

我个人觉得，机器人模型目前还处于GPT-1之前的时刻，还没有到GPT-1的时刻。
- 嘉宾对机器人AI技术发展阶段的判断，认为其尚未达到语言模型GPT-1的突破性水平。
01:13:21 — 姚顺宇 (Shunyu Yao):

我这个人，我个人的个性就是，总是爱干一些自己不太会的事。
- 嘉宾分享了自己敢于挑战未知、尝试新事物的个人特质。
01:16:08 — 姚顺宇 (Shunyu Yao):

我从那件事（争取入学机会）得到的人生最重要的道理，就是胆子要大。
- 嘉宾总结了自己通过争取机会改变命运的经历，强调了勇气的重要性。
01:18:30 — guest:

我感觉这个学校是愿意给大家提供机会, 给大家提供平等机会的。
- 高度评价了清华大学的精神,并点明这是他个人经历中一段重要的感恩之情。
01:19:06 — guest:

难道不是没有干到最好, 就是很菜吗？然后我显然没有干到最好, 所以就是很菜。
- 以一种极致的标准和自嘲的口吻评价自己在物理竞赛中的表现,体现了他的性格。
01:20:56 — guest:

当你没有办法理解别人在干什么的时候, 别指手画脚就是最好的。我觉得我爸妈这个道理懂得很好。
- 精辟地总结了其父母的教育方式,并将其上升为一种具有普适性的智慧。
01:28:43 — guest:

这就是人性的弱点, 就是我感觉我总爱挑战一些自己不会的事。
- 对自己不断跨界、挑战新领域的动机进行了深刻的自我剖析。
01:40:57 — guest:

这个大教训就是要去做有比较客观评价标准的事。
- 总结了他在高能理论博士研究中的核心反思,这个教训直接影响了他后续转向AI的职业选择。
01:46:41 — guest:

这个世界上所有东西都是黑盒…科学其实也不是真的有一个从它微观的行为一路演化到宏观的体现的这种理解。
- 从物理学的角度对「黑盒」问题给出了一个更宏大和深刻的类比,认为对任何复杂系统的理解都是在特定尺度下的有效理论,而非终极真理。
01:50:42 — guest:

为什么要把自己的时间浪费在伺候老灯身上。
- 用非常直白和尖锐的语言表达了对缺乏客观标准、依赖权威主观评价的学术环境的厌恶。
01:59:48 — guest:

我觉得公司的印象就是执行力非常强…它其实是一个比较top-down的公司。
- 精准概括了Anthropic与众不同的组织文化特点。
02:01:26 — guest:

我觉得这个公司很强的一点,就是它execution,执行力非常非常强。一旦给它一个信号,让它觉得是很reasonable,这个公司该做的事,那就会扑上去。
- 生动地描述了Anthropic对市场和技术机会的快速反应能力。
02:03:21 — guest:

Anthropic有这个条件,就是说它的技术上的leader,它的领导人,其实是公司的co-founder。
- 指出了Anthropic能够实现高效Top-down决策的组织架构根源。
02:19:11 — guest:

我觉得我个人对任何一个模型的贡献,我的阐述都是,我觉得我自己对那个事没那么重要,我觉得我更多的是我很幸运,有机会在那个时候加入了一个重要的项目,做了一些事。
- 表达了在当前大模型时代,个体贡献相对渺小,平台和时机更为重要的观点。
02:19:32 — guest:

它不在于你这个人去干或者不干,你不干自有别人一样能干出来的。
- 强调了AI技术发展的必然性和不可阻挡的趋势,淡化了个人英雄主义色彩。
02:24:32 — guest:

Idea is cheap. 想法是是是便宜的,很多想法其实很显然,所有人也都知道,难的是怎么把实现,怎么把它变成一个一个小的可实现的步骤,把它做出来。
- 犀利地指出了在工程实现复杂的AI领域,执行远比空谈想法更重要。
02:29:39 — guest:

我觉得本质上还是这个组织做了这样一件事情,或者这个世界需要这样。
- 从更高维度总结了AI重大突破的驱动力,是组织能力和时代需求的结合。
02:34:38 — 姚顺宇 (Shunyu Yao):

从我个人角度来说我觉得这个想法是非常幼稚的…更可能发生就是大家都有很好的前沿模型,而你没有办法阻止这个事,任何事发生。
- 对头部公司试图通过技术领先来主导 AI 安全规则的策略提出了根本性质疑。
02:36:41 — 姚顺宇 (Shunyu Yao):

我觉得它本质上简单的点在于它能做实验。它和本质上难的东西,比如说物理,它的区别在于,那个东西你没有那个能标下的实验数据,你就是理解不了那个能标下的理论。
- 提出了一个反直觉的观点,即 AI 的本质是’简单’的,其核心在于其无限的可实验性,这解释了该领域快速迭代的根本原因。
02:39:22 — 姚顺宇 (Shunyu Yao):

这个生意只有对一个公司是好生意,就是 Google。因为这个生意最后就是要打价格战。
- 精准地指出了纯 API 商业模式的脆弱性,并预言了其最终会演变成只有全栈巨头才能生存的价格战。
02:43:33 — 姚顺宇 (Shunyu Yao):

Train with finite, but use as infinite. 我觉得想要把这个训练的长度一直一直一直变长,可能并不是一个很现实的方案。
- 概括了解决长上下文问题的核心哲学,即不应盲目追求无限长的训练上下文,而应探索更高效的推理时利用方式。
02:59:41 — 姚顺宇 (Shunyu Yao):

我觉得这事很蠢,就是这个模型明明有那么多能力,但居然用的方法是 chatbot。很不 make sense。
- 尖锐地批评了当前主流的 Chatbot 交互形态,认为它极大地限制了 AI 模型潜能的发挥,呼吁产品层面的根本性创新。
03:12:42 — 姚顺宇 (Shunyu Yao):

一万个人有一万个世界模型…首先我不知道什么叫做一个世界模型,其次就是每个人在说他们做的世界模型的时候,可能也在说不一样的事。
- 指出了’世界模型’这一热门概念目前定义模糊、缺乏共识的现状。
03:19:06 — 姚顺宇 (Shunyu Yao):

在现在这个时代,一个研究员如果做不到对全局去考虑的话,他就不是一个好的研究员。这个和你在学术界做research是很不一样的事。
- 清晰地定义了工业界AI研究员与学术界的核心区别:是否具备系统性、全局性的思维和对公司的责任感。
03:31:02 — 姚顺宇 (Shunyu Yao):

这个行业最重要的特质,就是靠谱。就是做事细,然后对自己做的事负责任,这是最重要的特质。你说那些东西有多需要脑子,我觉得都是一些本科生就能干的活。
- 反主流地强调了在AI领域,’可靠性’和’责任心’比’聪明’更重要。
03:32:24 — 姚顺宇 (Shunyu Yao):

这是一个集体主义的事。
- 对当前大模型研发阶段的性质做出了精准概括,与他之前提到的’个人英雄主义时代已经过去’相呼应。
03:35:00 — 姚顺宇 (Shunyu Yao):

纯做语言模型已经不是一个蓝海了,我觉得末班车已经发车了。
- 对AI领域的职业发展给出了明确的判断和建议。
03:41:19 — 姚顺宇 (Shunyu Yao):

短期一定会有人恨你,但长期大家会会欣赏这件事情。
- 表达了一种关于直接沟通和坚持己见的长期价值的职场哲学。
03:46:20 — 姚顺宇 (Shunyu Yao):

别相信老登算吗?
- 以一种戏谑但尖锐的方式,表达了对权威和经验的批判性态度。

预测 (10)

00:25:13 (今年) — 姚顺宇: 模型做到了用有限的上下文训练，但在使用时能用非常长甚至接近无限的上下文。这件事情今年有机会能够实现。
00:25:32 (在技术实现之后) — 姚顺宇: 上述技术实现后，将解锁很多新的应用，例如实现大家梦想中的个人助手。
00:25:58 (今年) — 姚顺宇: 模型在上下文处理上的技术突破，今年无论如何是会实现。
00:28:54 (未来四个月) — 姚顺宇: 在未来的四个月，预训练的Scaling Law也没有看到到头的迹象。
00:46:32 (未来) — 姚顺宇 (Shunyu Yao): 程序员被彻底取代的这一天会来，但它不会是一瞬间的来，它一定会是一个渐变的过程。
00:47:02 (未来) — 姚顺宇 (Shunyu Yao): AI是一个很centralized的technology，它会让少部分人变得更强，但会让大部分人失去他们的独特价值。
00:47:13 (未来) — 姚顺宇 (Shunyu Yao): 最后变成的结果可能就是现在千分之一的人，干了过去所有人的工作，拿着现在一百倍的工资。
01:07:13 (未来) — 姚顺宇 (Shunyu Yao): 我个人感觉他们（机器人团队）未来会变得很重要，但是目前还没有找到自己的路。
02:37:50 (未来的6到12个月) — 姚顺宇 (Shunyu Yao): AI 将能够自己做实验,形成一个从写代码、跑实验、分析结果到提出新假设的完整闭环。
03:25:00 (Unspecified) — 姚顺宇 (Shunyu Yao): 绝大多数新成立的AI Lab都会失败。

视觉信号(纯转录看不到的)

录制设定: A casual indoor setting, likely an office or a relaxed studio space, with a prominent potted plant and a wooden wall in the background. · production: Casual and authentic. It appears to be a single-camera setup focused on the guest, with soft, natural lighting. The style is typical of a modern podcast interview rather than a high-budget studio production.

props: Large potted plant in a white container next to the guest., Guest’s white t-shirt with a small black label reading ‘WETIDONE’., Guest’s distinctive gold-rimmed glasses.

能量变化 (10)

📈 01:17:49 — Being asked what percentage of his work at Google uses competitor AI coding tools.
- A sudden burst of genuine, hearty laughter. He leans back, his eyes crinkle, and his body language becomes much more open and amused as he jokes about the question potentially getting him fired.
📈 01:42:43 — Recounting his personal story of choosing a high school to get into the competition class.
- The speaker’s energy lifts noticeably. He becomes more expressive, with a broader smile and more animated facial expressions. He leans into the story, visually signaling that this is a topic he enjoys recounting and that is core to his identity.
📉 01:49:20 — Discussing the gradual replacement of programmers by AI.
- His expression becomes more serious and his smile fades. He adopts a more measured, thoughtful posture, with less movement, reflecting the gravity of the topic.
📈 01:55:04 — Recalling his interview experiences at Gemini and Anthropic.
- The speaker’s smile broadens and he becomes more animated, laughing as he recounts the story. The energy is light and nostalgic.
📈 02:14:00 — Discussing the fundamental differences between startups and large companies in the AI space.
- His speech becomes more animated and he begins to use hand gestures to emphasize his points. He smiles broadly when making the provocative claim that AI work doesn’t require much ‘brainpower’ but rather reliability.
📉 02:23:30 — Discussing his departure from Anthropic and the cultural shifts within the company.
- His smiling subsides, and his expression becomes more neutral and thoughtful. The pace of his speech slows slightly, and he appears more introspective.
📈 02:33:45 — Recalling his proactive effort to get into Tsinghua’s special program.
- His smile widens and he becomes more animated, using a small hand gesture to emphasize the urgency he felt at the time. His energy is high and positive.
📉 02:44:21 — Explaining why he didn’t continue with his paradigm-shifting undergraduate research.
- His smile fades, his gaze becomes more distant and contemplative, and his posture becomes slightly more still as he discusses the ‘weakness of human nature’ and his desire to tackle new challenges.
📈 02:49:00 — Being asked to differentiate himself from the other famous ‘Yao Shunyu’ in the AI field.
- He breaks into a genuine, hearty laugh, shaking his head in amusement before explaining their different backgrounds (Physics vs. Computer Science). His demeanor is light and self-aware.
📉 02:57:17 — Reflecting on his PhD experience being less impactful on the world.
- He looks down, his expression turns more serious and introspective, and he speaks in a more measured tone, conveying a sense of disappointment despite his personal growth.

强调动作 (11)

01:18:35 — “Good code has common standards like being concise and having a clear structure.”
- He makes small, precise chopping motions with his right hand, as if delineating separate, clean concepts. · The gesture visually reinforces the idea of structure, clarity, and the separation of distinct, well-defined components in code.
01:55:45 — “The gap between US and Chinese models is getting smaller.”
- He brings his thumb and index finger close together, leaving a small space between them, and then moves them even closer. · A direct and universally understood visual metaphor for a gap shrinking, making the abstract concept of capability difference tangible.
01:57:59 — “He chose the reinforcement learning team at Anthropic because it was more uncert”
- He smiles widely and nods slightly. · The smile visually communicates his attraction to and excitement for tackling unknown, challenging problems, a key insight into his motivations that goes beyond the words themselves.
01:58:26 — “There are two types of distillation: ‘hard distillation’ and ‘smart distillation”
- He holds up two fingers (index and middle) on his right hand to visually separate the two concepts he is about to explain. · A simple enumerating gesture that primes the listener to expect two distinct categories, adding structure to his explanation.
02:09:00 — “The important thing for a startup is to ‘make a bet’.”
- He makes a small, decisive chopping motion with his right hand. · The gesture visually underscores the idea of making a firm, committed, and singular strategic decision.
02:30:00 — “The most important trait for an AI practitioner is being ‘reliable’ (靠谱).”
- He brings his hands together and interlaces his fingers in front of him, holding a stable posture. · This grounded gesture visually reinforces the concepts of reliability, meticulousness, and taking responsibility for one’s work.
02:33:47 — “The need to seize the opportunity immediately (‘现在就得争取’).”
- Makes a small, decisive chopping motion with his right hand. · The gesture physically underscores the urgency and finality of the decision he made in that moment.
02:43:38 — “The hard part is not the idea, but breaking it down into executable steps and ac”
- He uses a subtle chopping motion with his right hand. · The gesture visually represents the act of ‘breaking down’ a large idea into smaller, manageable pieces, reinforcing his point about the importance of execution over abstract concepts.
02:52:20 — “Describing how a small initial perturbation can lead to an exponentially large d”
- Spreads his hands apart quickly and widely, from a close position to a far one. · A clear visual metaphor for exponential growth, making the abstract concept of the butterfly effect more tangible.
02:58:09 — “Comparing achieving external standards to training a model (‘就像训练模型一样’).”
- Makes a circular, repetitive motion with his right index finger. · This gesture illustrates the iterative, mechanical, and somewhat predictable process of optimizing for a known evaluation metric.
31:38:00 — “The importance of being systematic (‘做事系统’) when debugging or analyzing unexpect”
- He uses his right hand to draw a structured, box-like shape in the air while speaking. · The gesture creates a visual metaphor for a system, a framework, or a structured process for problem-solving.

真实性 tell (10)

01:18:13 — A full-bodied, uninhibited laugh.: Reacting to the host’s question about using competitor AI tools at Google, which he jokes could get him fired.
- The laugh appears completely genuine and spontaneous, not forced. It shows he is comfortable with the interviewer, finds the situation genuinely funny, and is not actually worried, using humor to deflect a tricky question.
01:47:00 — A broad, nostalgic smile and increased animation.: When telling the personal story of how he chose his high school based on an ‘underdog’ strategy to get into the competition class.
- His visible enjoyment in telling this story suggests it’s a formative memory he is proud of. The shift from a professional, analytical demeanor to a more personal, storytelling mode feels authentic and reveals a key part of his personality and motivation.
01:51:40 — Slight hesitation and more deliberate, measured speech.: When asked to name which Chinese companies are engaging in ‘hard’ vs. ‘smart’ distillation of other models.
- He visually and audibly slows down, choosing his words carefully. This signals he is navigating a sensitive topic and is consciously avoiding making direct accusations while still conveying his opinion, which he eventually does after being prompted.
02:01:28 — Smiling while declining to answer.: When asked for specific technical details about Anthropic’s models, which are under NDA, he smiles, shakes his head slightly, and says ‘不能说’ (can’t say).
- This happens multiple times (e.g., 00:13:54). The friendly, non-confrontational refusal signals that he is bound by confidentiality but not being evasive or difficult. It reinforces his position as an insider with valuable knowledge while respecting his legal obligations, which adds to his credibility.
02:24:14 — A brief, thoughtful pause before answering.: Before explaining his view on the AI field being an ‘unstoppable’ force, he pauses for a moment, looking slightly down and to the side.
- This pause indicates he is not giving a rehearsed soundbite but is genuinely formulating a complex thought, lending weight and sincerity to the philosophical point he is about to make.
02:35:43 — He laughs and calls his own impressive competition record ‘挺菜的’ (quite lame).: When asked about his performance in academic competitions, which was actually at a very high level (provincial team).
- This is a form of humblebragging (‘凡尔赛’). The self-deprecating humor, combined with his relaxed smile, shows he is aware of his achievements but chooses to downplay them, a common cultural trait among high-achievers that feels authentic and relatable.
02:45:36 — He laughs and says his personality is ‘爱折磨自己’ (likes to torture himself).: When asked about his tendency to repeatedly switch to more difficult, unfamiliar fields.
- This candid and humorous self-assessment reveals a high degree of self-awareness. He is able to laugh at his own intense drive, making his ambitious nature seem more human and less intimidating.
03:08:35 — He immediately shakes his head, laughs, and says ‘我就是没搞明白啊’ (I just couldn’t figure it out).: When asked to explain the intricacies of building an optical experiment setup.
- This is a moment of genuine intellectual honesty. By openly and cheerfully admitting his limitations in experimental physics, he reinforces his credibility as an expert in theoretical domains. It shows he is not afraid to be vulnerable about what he doesn’t know.
22:02:00 — A long, thoughtful pause, looking up and away from the host before answering.: When asked to explain the strategic rationale behind recent major acquisitions like Meta/Minus, he honestly replies, ‘我不理解’ (I don’t understand).
- The pause and his candid admission of not fully understanding the situation, despite being an expert, project a high degree of intellectual honesty and authenticity. It shows he is thinking through the question rather than giving a prepared answer.
26:44:00 — A quick, subtle shake of the head and a slight smile.: In response to the host’s suggestion that the pace of model improvement is slowing down.
- His immediate, non-verbal disagreement (‘我觉得完全没有’ - I think not at all) precedes his verbal explanation, revealing a genuine, uncoached conviction that progress is not slowing down. It feels like a gut reaction from someone on the front lines.

转录会丢失的事实/质感

The guest’s consistently calm, friendly, and confident demeanor. He smiles frequently, which makes complex and high-stakes topics feel more accessible and less intimidating.
The casual, intimate nature of the interview. The guest is clearly speaking to a host who is physically present just off-camera, creating a natural conversational dynamic that a transcript would miss.
The contrast between the relaxed, almost home-like setting (with the large plant) and the discussion of the hyper-competitive, fast-paced global AI industry.
The overall tone of the interview is one of intellectual curiosity and candid sharing, rather than a formal or confrontational exchange. The guest’s body language is consistently open and relaxed.
The speaker’s consistently cheerful and relaxed demeanor, frequently smiling even when discussing complex or serious topics, which creates a very approachable and engaging tone.
The stark visual contrast between his broad, easy laugh when joking (e.g., about getting fired) and his serious, focused expression when discussing the societal impact of AI.
The subtle but clear shift in body language and expression when he moves from technical analysis to personal anecdotes, becoming more animated and revealing more of his personality.
The warm, professional-but-not-corporate visual setting, which contributes to the interview’s intimate and conversational feel.
The guest’s pervasive and easy-going smile, which conveys a sense of comfort, confidence, and genuine enjoyment throughout the entire conversation, a quality not fully captured by the text alone.
The subtle, constant nodding and affirmative facial expressions he makes while the host is speaking, indicating active and engaged listening.
The contrast in his demeanor when discussing personal history (relaxed, broadly smiling) versus explaining scientific concepts (more focused, leaning in slightly, using precise hand gestures).
The way he often looks slightly down and to the side with a small smile before answering a complex or personal question, a visual cue that he is taking a moment to formulate a thoughtful and precise response.
The consistent visual theme of the guest’s relaxed, smiling demeanor, which creates a stark contrast with the highly technical and competitive nature of the topics being discussed (e.g., scaling laws, corporate strategy).
The subtle but noticeable shift in his body language from open and amused when talking about his past, to more guarded and serious when discussing his reasons for leaving Anthropic.
The visual branding of the podcast, including the ‘ZHANG XIAOJUN’ and ‘PODCAST #140’ text overlay, which frames the conversation.
The way he good-naturedly ‘stonewalls’ questions about trade secrets, using a smile and a simple ‘can’t say’ to navigate NDAs without creating tension.

提及实体

人物 (20): Andrej Karpathy, Ben Mann, Boris, Dario Amodei, Demis Hassabis, F. Duncan Haldane, Fei-Fei Li, Geoffrey Hinton, Ilya Sutskever, Jared Kaplan, Koray Kavukcuoglu, Sam McCandlish, Sergey Brin, Tom Brown, Wolfgang Pauli, 吴泳辉, 姚顺宇, 张首晟, 杨振宁, 王中

公司/机构 (25): Anthropic, Apple, ByteDance, Cursor, DeepMind, DeepSeek, Dexterity, Google, Google DeepMind, Isomorphic Labs, Meta, Midjourney, Minus, OpenAI, OpenCloud, Sakana AI, SpaceX, Tencent, WinSurf, Zhipu AI, xAI, 伯克利, 斯坦福大学, 格致中学, 清华大学

论文/方法/数据集 (44): 3D Torus, AlphaFold, AlphaTensor, Claude Code, Cloud Code, Distillation, GPT-1, Gemini, Gemini 1.5, Long Horizon, ML Coding, Multi-agent training, NVLink, Nano-Bard, Policy Gradient, Post-training, Pre-training, RL, Reinforcement Learning (RL), SFT, Scaling Law, Scaling Laws, Sparse Attention, Transformer, VLA, World Models, 凝聚态理论, 布洛赫波 (Bloch wave), 弦论 (String Theory), 强化学习, 拓扑现象 (Topological Phenomena), 智能涌现 (Emergent Abilities), 物理竞赛, 监督学习, 自主招生, 薛定谔的猫 (Schrödinger’s cat), 蝴蝶效应 (Butterfly Effect), 重整化群 (Renormalization Group), 量子物理, 量子纠缠 (Quantum Entanglement), 量子计算 (Quantum Computing), 非厄米系统 (Non-Hermitian System), 预训练, 高能理论 (High-Energy Theory)

Takeaways

AI发展已从关注技术可行性转向如何定义和应用问题，人类的洞察力变得更加关键。
模型在基准测试上趋同，但实际用户体验和特定能力（如工具使用、编码、推理）仍存在差异，是各家竞争的重点。
AI模型在上下文处理能力上将迎来重大突破，有望实现“有限训练，无限使用”的上下文能力，这将是个人助手等新应用的基础。
AI创业公司面临严峻挑战，需要建立强大的模型壁垒或找到足够小的利基市场才能生存，否则容易被巨头整合。
模型学习能力显著增强，预训练的Scaling Law尚未触顶，未来几个月仍将持续快速进步。
代码生成是AI领域发展最快的场景之一，得益于清晰的反馈信号和GitHub等高质量数据源。
AI发展的核心驱动力在当前框架下主要来自算力和数据，算法则在突破瓶颈时发挥关键作用。
AI极大地提升了编程效率，尤其在实验和想法验证方面，效率提升可达20-50倍。
AI的普及使得工作强度和时间增加，因为效率提升激发了更多尝试新想法的欲望。
AI目前在逻辑性、客观性强的任务上表现出色，但在缺乏明确标准和客观评估的领域（如产品经理）仍难以替代人类。
AI对程序员职业的影响是渐进式的，未来少数顶尖程序员将掌握更多资源和价值，而大多数人需要学习与AI高效协作。
中国AI在算力资源劣势下，反而催生了蒸馏等技术创新。
机器人AI仍处于早期阶段，尚未找到像LLM那样可横向扩展的通用能力，但未来潜力巨大。
个人成长经历中，敢于挑战不熟悉的事物和大胆争取机会是重要的驱动力。
主动争取机会至关重要,即使规则看似不利,积极的行动也可能创造转机。
不断挑战未知和困难是强大的成长驱动力,但也可能意味着要放弃已有的成就,这是一种「人性的弱点」,也是一种选择。
选择职业方向时,应倾向于那些有客观评价标准、能对世界产生实际影响的领域,避免陷入纯粹主观和抽象的困境。
从物理学到AI的跨界,其底层逻辑在于研究范式的相似性——即理论构想与数值实验相结合的模式,而非具体的知识或技能迁移。
在职业生涯的早期阶段,尤其是在新兴领域,人脉网络和圈子内的相互引荐起着不可忽视的关键作用。
Anthropic的成功在很大程度上归功于其独特的Top-down文化,这种文化由技术领袖同时担任联合创始人并拥有最终决策权的组织架构所保障。
这种架构使得作为创业公司的Anthropic能够进行高风险的’技术押注’(make bets)并快速执行,这构成了其相较于Google等大公司或后期OpenAI的核心优势。
大模型研发已告别’个人英雄主义’时代,如今的突破更多是大型系统工程和集体协作的产物,个体在其中的作用相对有限,平台和时机至关重要。
当前AI发展的瓶颈可能并非技术本身(预训练和后训练都未到顶),而是我们对新应用场景的想象力不足,导致技术进步的价值无法被充分挖掘。
对于研究员个人而言,选择平台时需明确自身目标:若追求对最终产品的直接、明确影响力,创业公司更优;若追求更广泛的学习机会和研究自由度,大公司平台则更具优势。
AI 安全的最终解决方案可能不是由单一巨头主导,而是一种类似核威慑的多方制衡机制。
AI 发展的核心驱动力是其’可实验性’,任何想法都能被快速验证,因此创新的瓶颈不在于缺少想法,而在于验证想法的速度。
纯粹依赖 API 售卖的商业模式是脆弱的,容易陷入价格战,只有具备全栈能力(从芯片到应用)的公司才能在其中占据优势。
当前主流的 Chatbot 交互形式远非 AI 的终极形态,它极大地限制了模型的能力,产品和交互上的范式转移是未来的重要机遇。
AI 领域的竞争格局远未稳定,技术和产品的快速迭代意味着任何公司的领先地位都可能被迅速挑战,不存在绝对稳固的护城河。
在当前的大模型时代,AI研发是系统工程,’可靠性’和’系统思维’比个人天赋更重要。
AI领域的’个人英雄主义’时代已过,现在是需要紧密协作的’集体主义’阶段。
优秀的技术领导者需要具备亲自解决难题的能力和容纳不同意见的胸怀。
中美AI生态在产品思路上存在显著差异:美国强于直接变现的企业软件,中国则擅长构建复杂的C端生态和间接商业模式。
对于新入行者,纯语言模型领域的机会窗口正在关闭,而多模态、机器人和AI for Science等方向可能是新的’蓝海’。
在客观的AI领域,保持批判性思维、敢于直接表达自洽的观点,长期来看比一味’收敛’更有价值。