Second Egocentric Vision (EgoVis) Workshop

Event: CVPR 2025 (Highlight) · Duration: 25 min · ▶ Watch on YouTube

Speakers

00:04 — Ashutosh: FIction: 4D Future Interaction Prediction from Video
- This work introduces a novel multimodal architecture for predicting future interaction locations and body poses in 4D space (3D + temporal dimension) over significantly longer durations than prior work, outperforming baselines on a curated Ego-Exo4D dataset.
06:15 — Zhenyi Liu: EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision
- This project introduces EgoPressure, the first high-quality egocentric touch contact and pressure dataset with 3D hand poses, along with an optimization method for hand pose annotation and a novel model called PressureFormer for 3D hand pressure estimation using transformers.
14:06 — Brent Yi: Estimating Body And Hand Motion in an Ego-sensed World
- This work presents EgoAllo, a system that uses egocentric sensor data (device SLAM pose and visual hand observations) to estimate complete allocentric human body pose, height, and hand parameters by combining a head-conditioned diffusion model with guidance from an off-the-shelf hand estimator.
19:29 — Rosario Forte: Online Episodic Memory Visual Query Localization with Egocentric Streaming Object Memory
- This work introduces an online episodic memory system (ESOM) for visual query localization in egocentric video, which operates without storing the entire video stream, is memory-efficient, and scalable, outperforming baselines on the Ego4D dataset.

Open for commentary — connections to other work, critiques, follow-up reading.