Towards intelligent robots

Event: Academic Talk · Duration: 0 min · ▶ Watch on YouTube

Abstract

This talk, ‘Towards intelligent robots,’ explores the foundational challenges in robotics through the lens of biological evolution and child development. Speaker Jitendra Malik categorizes the core problems as locomotion, navigation, and manipulation, showcasing recent advancements in each. He highlights the critical role of multi-modal sensing, robustness, and adaptation in achieving intelligent robot behavior. A central theme is the importance of mastering low-level ‘atomic skills’ as the building blocks for complex tasks, suggesting that these embodied capabilities are essential for future robot dexterity, akin to how human children learn.

Speakers

Jitendra Malik — UC Berkeley/Meta

Talks (19)

00:11 — Jitendra Malik: Phylogeny of Intelligence
- Discusses the evolutionary origins of intelligence, highlighting its emergence with movement, perception for navigation, manipulation, and later, language.
02:27 — Jitendra Malik: Central problems of robotics
- Introduces locomotion, navigation, and manipulation as the three fundamental challenges in robotics, emphasizing the connection between perception and action.
02:57 — Jitendra Malik: Humanoid Locomotion as Next Token Prediction
- Presents a NeurIPS paper demonstrating robust humanoid robot locomotion on diverse and challenging terrains, suggesting significant progress in this area.
03:51 — Jitendra Malik: GOAT: GO to Any Thing
- Introduces a navigation system that allows a robot to autonomously navigate unknown indoor environments to find specified objects or locations, building a semantic map on the fly.
07:22 — Jitendra Malik: Learning Visuotactile Bimanual Dexterous Skills
- Showcases a system enabling multi-fingered robot hands to learn and execute complex dexterous manipulation tasks like slippery handovers, stacking, wine pouring, and steak serving.
09:13 — Jitendra Malik: So how close to ‘solved’ are we?
- Provides a status update on robotics problems: locomotion (good progress), navigation (nearly done), and manipulation (long way to go), highlighting data and efficiency as major challenges.
10:48 — Jitendra Malik: Towards an Embodied AI, inspired by child development
- Advocates for an embodied AI approach inspired by child development, quoting Alan Turing and outlining six lessons from embodied cognition for AI design.
12:47 — Jitendra Malik: Pattern Recognition vs Control Theory
- Compares the focus of visual recognition (generalization over aspects) with motor control (robustness to disturbances and adaptation to varying physical conditions) in robotics.
14:19 — Jitendra Malik: Rapid Motor Adaptation for Legged Robots
- Demonstrates a method for quadruped robots to rapidly adapt their motor policy to unknown extrinsic conditions and terrains by estimating environmental parameters online from past history.
17:41 — Jitendra Malik: Legged Locomotion in Challenging Terrains using Egocentric Vision
- Illustrates how egocentric vision enables precise locomotion for legged robots on complex and uneven terrains without relying on explicit mapping.
20:49 — Jitendra Malik: Rotatelt: General In-Hand Object Rotation with Vision and Touch
- Presents a multi-modal approach combining vision and tactile sensing for robust and generalized in-hand object rotation, showing improved performance with integrated sensing.
27:06 — Jitendra Malik: Twisting Lids Off with Two Hands
- Demonstrates bimanual manipulation for twisting lids off bottles, highlighting generalization across different objects using deep reinforcement learning and sim-to-real transfer.
27:59 — Jitendra Malik: Atomic Visual Actions (Gu et al, 2017)
- Introduces the concept of ‘atomic actions’ from computer vision, categorizing fundamental human actions that can be composed for understanding complex activities.
29:58 — Jitendra Malik: Common Atomic Manual Actions
- Applies the concept of atomic actions to robot manipulation, listing fundamental manual actions with and without tools that robots should be able to perform.
31:49 — Jitendra Malik: Atomic Skills
- Discusses various training methods for atomic skills, including reinforcement learning, supervised learning with tele-operation, and visual imitation.
32:54 — Jitendra Malik: Humanoid Control as Next Token Prediction
- Presents a transformer-based approach for humanoid locomotion, trained on diverse data like motion capture and internet videos, enabling robust walking in complex environments.
35:17 — Jitendra Malik: VideoMimic
- Showcases visual imitation learning for contextual humanoid control, where robots learn complex behaviors by mimicking human actions from video demonstrations in various environments.
36:50 — Jitendra Malik: Imitation Learning for Robot Manipulation
- Explores reconstructing 3D hand-object interactions from video to facilitate imitation learning for robot manipulation, aiming to replace laborious tele-operation.
38:55 — Jitendra Malik: What next?
- Concludes by emphasizing the need to focus on low-level, embodied skills for robot dexterity, which can then be composed for complex tasks, drawing parallels to human learning.

Key Takeaways

Intelligence in robots, like in biology, fundamentally relies on mastering low-level sensorimotor skills for locomotion, navigation, and manipulation before higher-level cognitive abilities.
While locomotion and navigation have seen significant progress, dexterous manipulation remains a major challenge, requiring more efficient learning techniques and vast amounts of data.
Multi-modal sensing (vision, touch, proprioception) is crucial for robust and adaptive robot control, especially in complex manipulation tasks, as different modalities provide complementary information.
The development of ‘atomic skills’ – fundamental, composable sensorimotor control loops – is key to building general-purpose robots, drawing inspiration from how children learn basic interactions with the world.
Advances in 3D reconstruction from video and visual imitation learning offer promising avenues to collect large-scale, diverse datasets for training robot skills, potentially overcoming the data bottleneck in robotics.

Methods / Models / Datasets Mentioned

Humanoid Locomotion as Next Token Prediction
GOAT (Go to Any Thing)
Learning Visuotactile Bimanual Dexterous Skills
Rapid Motor Adaptation for Legged Robots
Legged Locomotion in Challenging Terrains using Egocentric Vision
Rotatelt: General In-Hand Object Rotation with Vision and Touch
Visuotactile Policy Training
Digit 360
Twisting Lids Off with Two Hands
Atomic Visual Actions (Gu et al, 2017)
Common Atomic Manual Actions
VideoMimic
Reconstructing Hand-Object Interactions in the Wild (ICCV 2021)
Reconstructing Hand-Held Objects in 3D (arxiv 2404.06507)
HaMeR (Berkeley)
MCC & Dino (Meta)
Reinforcement Learning
Supervised Learning
Tele-operation
Visual Imitation
Transformer
RNN
MLP

Topics

Robotics · Intelligent Robots · Locomotion · Navigation · Manipulation · Embodied AI · Child Development · Multi-modal Sensing · Dexterous Manipulation · Reinforcement Learning · Imitation Learning · Sim-to-real · Atomic Skills · Humanoid Robots · Quadruped Robots · Tactile Sensing · Vision · Control Theory · Data Efficiency

Notes

Open for commentary — connections to other work, critiques, follow-up reading.