Computer Vision Foundation Workshop

Event: CVPR 2024 Workshop · Duration: 44 min · ▶ Watch on YouTube

Abstract

This workshop features two presentations on cutting-edge computer vision research. The first talk introduces a novel approach for real-time 3D full-body pose estimation from head-mounted XR devices, utilizing physics simulation and synthetic data to drive avatars. The second presentation delves into the significance of human presence in 3D scene understanding, showcasing how people can act as ‘scene probes’ to infer scene properties and discussing advancements in telepresence technology with Project Starline.

Speakers

  • Ramya — Carnegie Mellon University, Meta Reality Labs
  • Steve Seitz — University of Washington

Talks (2)

  • 00:02:50Ramya: Real-Time Simulated Avatar from Head-Mounted Sensors
    • This talk presents a method for estimating 3D full-body pose from head-mounted XR device sensor information in real-time by leveraging physics simulation and synthetic data generation.
  • 19:44:00Steve Seitz: Where Is Everyone?
    • This talk explores the importance of people in scene understanding and reconstruction, demonstrating how human presence and motion can be leveraged as ‘scene probes’ to infer geometric and photometric properties, and discusses Project Starline for realistic telepresence.

Key Takeaways

  • Estimating 3D full-body pose from head-mounted XR devices is challenging due to extreme viewpoints and limited visual information, but physics simulation and synthetic data can help overcome this.
  • Human presence and motion in a scene provide valuable cues for understanding and reconstructing 3D environments, acting as ‘scene probes’ to infer geometric and photometric properties.
  • Achieving perceived eye contact in digital communication is counter-intuitive; looking slightly below the camera lens, rather than directly into it, can create a more natural sense of eye contact.
  • Project Starline aims to create highly realistic telepresence experiences, making remote interactions feel like being in the same room, with technology evolving towards smaller, desk-sized units leveraging AI and regular cameras.
  • The presented avatar simulation system runs in real-time (30 FPS) on a desktop, with potential for optimization to run directly on XR headsets.

Methods / Models / Datasets Mentioned

  • Isaac Gym
  • Meta Reality Labs
  • Quest 2
  • SLAM
  • Photo Tourism
  • Google Earth
  • Project Starline

Topics

3D pose estimation · XR headsets · Real-time avatar simulation · Physics simulation · Synthetic data generation · Egocentric human motion · 3D scene reconstruction · People as scene probes · Eye contact · Telepresence


Notes

Open for commentary — connections to other work, critiques, follow-up reading.