Towards intelligent robots

Event: Academic Talk · Duration: 0 min · ▶ Watch on YouTube

Abstract

This talk, ‘Towards intelligent robots,’ explores the foundational challenges in robotics through the lens of biological evolution and child development. Speaker Jitendra Malik categorizes the core problems as locomotion, navigation, and manipulation, showcasing recent advancements in each. He highlights the critical role of multi-modal sensing, robustness, and adaptation in achieving intelligent robot behavior. A central theme is the importance of mastering low-level ‘atomic skills’ as the building blocks for complex tasks, suggesting that these embodied capabilities are essential for future robot dexterity, akin to how human children learn.

Speakers

  • Jitendra Malik — UC Berkeley/Meta

Talks (19)

  • 00:11Jitendra Malik: Phylogeny of Intelligence
    • Discusses the evolutionary origins of intelligence, highlighting its emergence with movement, perception for navigation, manipulation, and later, language.
  • 02:27Jitendra Malik: Central problems of robotics
    • Introduces locomotion, navigation, and manipulation as the three fundamental challenges in robotics, emphasizing the connection between perception and action.
  • 02:57Jitendra Malik: Humanoid Locomotion as Next Token Prediction
    • Presents a NeurIPS paper demonstrating robust humanoid robot locomotion on diverse and challenging terrains, suggesting significant progress in this area.
  • 03:51Jitendra Malik: GOAT: GO to Any Thing
    • Introduces a navigation system that allows a robot to autonomously navigate unknown indoor environments to find specified objects or locations, building a semantic map on the fly.
  • 07:22Jitendra Malik: Learning Visuotactile Bimanual Dexterous Skills
    • Showcases a system enabling multi-fingered robot hands to learn and execute complex dexterous manipulation tasks like slippery handovers, stacking, wine pouring, and steak serving.
  • 09:13Jitendra Malik: So how close to ‘solved’ are we?
    • Provides a status update on robotics problems: locomotion (good progress), navigation (nearly done), and manipulation (long way to go), highlighting data and efficiency as major challenges.
  • 10:48Jitendra Malik: Towards an Embodied AI, inspired by child development
    • Advocates for an embodied AI approach inspired by child development, quoting Alan Turing and outlining six lessons from embodied cognition for AI design.
  • 12:47Jitendra Malik: Pattern Recognition vs Control Theory
    • Compares the focus of visual recognition (generalization over aspects) with motor control (robustness to disturbances and adaptation to varying physical conditions) in robotics.
  • 14:19Jitendra Malik: Rapid Motor Adaptation for Legged Robots
    • Demonstrates a method for quadruped robots to rapidly adapt their motor policy to unknown extrinsic conditions and terrains by estimating environmental parameters online from past history.
  • 17:41Jitendra Malik: Legged Locomotion in Challenging Terrains using Egocentric Vision
    • Illustrates how egocentric vision enables precise locomotion for legged robots on complex and uneven terrains without relying on explicit mapping.
  • 20:49Jitendra Malik: Rotatelt: General In-Hand Object Rotation with Vision and Touch
    • Presents a multi-modal approach combining vision and tactile sensing for robust and generalized in-hand object rotation, showing improved performance with integrated sensing.
  • 27:06Jitendra Malik: Twisting Lids Off with Two Hands
    • Demonstrates bimanual manipulation for twisting lids off bottles, highlighting generalization across different objects using deep reinforcement learning and sim-to-real transfer.
  • 27:59Jitendra Malik: Atomic Visual Actions (Gu et al, 2017)
    • Introduces the concept of ‘atomic actions’ from computer vision, categorizing fundamental human actions that can be composed for understanding complex activities.
  • 29:58Jitendra Malik: Common Atomic Manual Actions
    • Applies the concept of atomic actions to robot manipulation, listing fundamental manual actions with and without tools that robots should be able to perform.
  • 31:49Jitendra Malik: Atomic Skills
    • Discusses various training methods for atomic skills, including reinforcement learning, supervised learning with tele-operation, and visual imitation.
  • 32:54Jitendra Malik: Humanoid Control as Next Token Prediction
    • Presents a transformer-based approach for humanoid locomotion, trained on diverse data like motion capture and internet videos, enabling robust walking in complex environments.
  • 35:17Jitendra Malik: VideoMimic
    • Showcases visual imitation learning for contextual humanoid control, where robots learn complex behaviors by mimicking human actions from video demonstrations in various environments.
  • 36:50Jitendra Malik: Imitation Learning for Robot Manipulation
    • Explores reconstructing 3D hand-object interactions from video to facilitate imitation learning for robot manipulation, aiming to replace laborious tele-operation.
  • 38:55Jitendra Malik: What next?
    • Concludes by emphasizing the need to focus on low-level, embodied skills for robot dexterity, which can then be composed for complex tasks, drawing parallels to human learning.

Key Takeaways

  • Intelligence in robots, like in biology, fundamentally relies on mastering low-level sensorimotor skills for locomotion, navigation, and manipulation before higher-level cognitive abilities.
  • While locomotion and navigation have seen significant progress, dexterous manipulation remains a major challenge, requiring more efficient learning techniques and vast amounts of data.
  • Multi-modal sensing (vision, touch, proprioception) is crucial for robust and adaptive robot control, especially in complex manipulation tasks, as different modalities provide complementary information.
  • The development of ‘atomic skills’ – fundamental, composable sensorimotor control loops – is key to building general-purpose robots, drawing inspiration from how children learn basic interactions with the world.
  • Advances in 3D reconstruction from video and visual imitation learning offer promising avenues to collect large-scale, diverse datasets for training robot skills, potentially overcoming the data bottleneck in robotics.

Methods / Models / Datasets Mentioned

  • Humanoid Locomotion as Next Token Prediction
  • GOAT (Go to Any Thing)
  • Learning Visuotactile Bimanual Dexterous Skills
  • Rapid Motor Adaptation for Legged Robots
  • Legged Locomotion in Challenging Terrains using Egocentric Vision
  • Rotatelt: General In-Hand Object Rotation with Vision and Touch
  • Visuotactile Policy Training
  • Digit 360
  • Twisting Lids Off with Two Hands
  • Atomic Visual Actions (Gu et al, 2017)
  • Common Atomic Manual Actions
  • VideoMimic
  • Reconstructing Hand-Object Interactions in the Wild (ICCV 2021)
  • Reconstructing Hand-Held Objects in 3D (arxiv 2404.06507)
  • HaMeR (Berkeley)
  • MCC & Dino (Meta)
  • Reinforcement Learning
  • Supervised Learning
  • Tele-operation
  • Visual Imitation
  • Transformer
  • RNN
  • MLP

Topics

Robotics · Intelligent Robots · Locomotion · Navigation · Manipulation · Embodied AI · Child Development · Multi-modal Sensing · Dexterous Manipulation · Reinforcement Learning · Imitation Learning · Sim-to-real · Atomic Skills · Humanoid Robots · Quadruped Robots · Tactile Sensing · Vision · Control Theory · Data Efficiency


Notes

Open for commentary — connections to other work, critiques, follow-up reading.