Humanoid Policy ~ Human Policy
Event: Second Egocentric Vision (EgoVis) Workshop · Duration: 27 min · ▶ Watch on YouTube
Abstract
The presentation introduces the concept of ‘Humanoid Policy ~ Human Policy,’ a novel approach to training humanoid robot policies. It addresses the challenge of data scarcity in robotics by combining limited robot teleoperation data with extensive egocentric human demonstration data. The core idea involves transforming both robot and human observations and actions into a unified human-centric state-action space, allowing a single policy to learn from both modalities. This method, facilitated by advancements in VR hardware for data collection and powerful models like transformers and DINO-V2, demonstrates enhanced generalization capabilities for humanoid robots across various manipulation tasks, object types, and environmental backgrounds, including cross-embodiment scenarios. The speaker also provides historical context and future outlooks, emphasizing the potential of full-body teleoperation and Sim2Real transfer for complex, real-world applications.
Speakers
- Xiaolong Wang — UCSD, USA
Talks (1)
- 00:00 — Xiaolong Wang: Humanoid Policy ~ Human Policy
- This talk presents a method for training humanoid robot policies by aligning small-scale robot teleoperation data with large-scale, task-oriented egocentric human demonstration data into a unified human-centric state-action space, significantly improving generalization across diverse tasks and environments.
Key Takeaways
- Egocentric vision is a powerful modality for collecting rich, realistic human demonstration data that can be used to train robot policies.
- By aligning robot and human data into a unified human-centric state-action space, policies can leverage large-scale human demonstrations to significantly improve generalization, especially in out-of-distribution settings.
- Modern hardware (e.g., VR headsets like Vision Pro) and advanced models (e.g., transformers, DINO-V2) are crucial enablers for effective egocentric data collection and robust policy learning.
- The concept of learning from human observation for robotics has a long history, but current technological advancements are making it increasingly practical and impactful.
- Future work involves extending these methods to full-body humanoid control and integrating Sim2Real transfer to further enhance robustness and scalability for complex real-world tasks.
Methods / Models / Datasets Mentioned
Open-TeleVisionDINO-V2ARKitMeta WebXR APIInverse KinematicsRetargetingHAT (Transformer)Temporal InterpolationAdaptive Motion Optimization (AMO)DexMVDex1B
Topics
Egocentric Vision · Humanoid Robotics · Teleoperation · Imitation Learning · Cross-Embodiment Learning · Human-Robot Interaction · Data Collection · Policy Learning · Generalization · Sim2Real Transfer · Dexterous Manipulation
Notes
Open for commentary — connections to other work, critiques, follow-up reading.