Crows vs Robots

Event: CVPR 2025 Workshop · Duration: 27 min · ▶ Watch on YouTube

Abstract

This presentation explores the contrast between the intelligence and efficiency of biological systems, like crows, and modern AI models, particularly LLMs. It highlights the energy demands and potential for hallucination in current AI, advocating for more efficient, consequence-aware learning in robotics. The speaker introduces a novel approach where robots learn complex physical tasks, such as folding and throwing paper airplanes or designing Kirigami grippers, through iterative experimentation and differentiable models. This method allows robots to learn from their surroundings and generalize to new scenarios, even without extensive human supervision, by leveraging video generative models to predict and imitate human behavior in diverse environments.

Speakers

Carl Vondrick — Columbia University

Talks (4)

00:38 — Carl Vondrick: Crows vs Robots
- The speaker introduces the concept of intelligence in animals, specifically crows, and contrasts their energy efficiency and ability to learn from physical consequences with the high power consumption and hallucination tendencies of large language models (LLMs).
05:00 — Carl Vondrick: Paper Airplane Scientist: Learning to Fold and Throw
- A robotic system is presented that learns to optimally fold and throw paper airplanes through iterative physical experimentation and a differentiable predictive model, demonstrating efficient learning in the real world.
10:32 — Carl Vondrick: Kirigami Gripper: Learning by Cutting
- The same learning methodology is applied to designing Kirigami grippers, where the robot learns to make precise cuts in paper to create grippers capable of picking up various objects, including delicate ones like strawberries, demonstrating adaptability and force control.
12:47 — Carl Vondrick: Generalization and Behavior via Video Generation
- The discussion shifts to generalization in robotics, highlighting the challenge of data collection in the physical world versus abundant visual data. A new approach, ‘Dreamitate’, is introduced where video generative models predict human actions, and robots then imitate these actions, showing promising results in generalizing to novel tasks and environments.

Key Takeaways

Biological intelligence, exemplified by crows, demonstrates remarkable efficiency and causal understanding in real-world environments, a stark contrast to the high energy demands and hallucination tendencies of current LLMs.
Robots can learn to perform complex physical tasks, like designing paper airplanes or grippers, through iterative experimentation and differentiable predictive models, effectively building causal models of their environment.
Leveraging video generative models allows robots to ‘dream’ human demonstrations and then imitate these actions, enabling significant generalization to novel tasks and objects with minimal real-world human demonstrations.
This approach addresses the challenge of data scarcity in physical robotics by using readily available visual data from the internet, offering a path towards more robust and adaptable robotic systems.

Methods / Models / Datasets Mentioned

DP-ResNet
DP-CLIP
GR00T
FPV
DP-VLA
Transformer

Topics

Animal Intelligence · Robotics · Machine Learning · Generalization · Efficiency · Causal Models · Paper Airplanes · Kirigami Grippers · Video Generative Models · Behavior Cloning

Notes

Open for commentary — connections to other work, critiques, follow-up reading.