Humanoid Policy ~ Human Policy

Event: Second Egocentric Vision (EgoVis) Workshop · Duration: 27 min · ▶ Watch on YouTube

Abstract

The presentation introduces the concept of ‘Humanoid Policy ~ Human Policy,’ a novel approach to training humanoid robot policies. It addresses the challenge of data scarcity in robotics by combining limited robot teleoperation data with extensive egocentric human demonstration data. The core idea involves transforming both robot and human observations and actions into a unified human-centric state-action space, allowing a single policy to learn from both modalities. This method, facilitated by advancements in VR hardware for data collection and powerful models like transformers and DINO-V2, demonstrates enhanced generalization capabilities for humanoid robots across various manipulation tasks, object types, and environmental backgrounds, including cross-embodiment scenarios. The speaker also provides historical context and future outlooks, emphasizing the potential of full-body teleoperation and Sim2Real transfer for complex, real-world applications.

Speakers

Xiaolong Wang — UCSD, USA

Talks (1)

00:00 — Xiaolong Wang: Humanoid Policy ~ Human Policy
- This talk presents a method for training humanoid robot policies by aligning small-scale robot teleoperation data with large-scale, task-oriented egocentric human demonstration data into a unified human-centric state-action space, significantly improving generalization across diverse tasks and environments.

Key Takeaways

Egocentric vision is a powerful modality for collecting rich, realistic human demonstration data that can be used to train robot policies.
By aligning robot and human data into a unified human-centric state-action space, policies can leverage large-scale human demonstrations to significantly improve generalization, especially in out-of-distribution settings.
Modern hardware (e.g., VR headsets like Vision Pro) and advanced models (e.g., transformers, DINO-V2) are crucial enablers for effective egocentric data collection and robust policy learning.
The concept of learning from human observation for robotics has a long history, but current technological advancements are making it increasingly practical and impactful.
Future work involves extending these methods to full-body humanoid control and integrating Sim2Real transfer to further enhance robustness and scalability for complex real-world tasks.

Methods / Models / Datasets Mentioned

Open-TeleVision
DINO-V2
ARKit
Meta WebXR API
Inverse Kinematics
Retargeting
HAT (Transformer)
Temporal Interpolation
Adaptive Motion Optimization (AMO)
DexMV
Dex1B

Topics

Egocentric Vision · Humanoid Robotics · Teleoperation · Imitation Learning · Cross-Embodiment Learning · Human-Robot Interaction · Data Collection · Policy Learning · Generalization · Sim2Real Transfer · Dexterous Manipulation

Notes

Open for commentary — connections to other work, critiques, follow-up reading.