Second Egocentric Vision (EgoVis) Workshop

Event: EgoVis Distinguished Papers Award · Duration: 23 min · ▶ Watch on YouTube

Speakers

Dima Damen — University of Bristol and Google DeepMind
Takehiko Ohkawa — Meta
Yale Song — Meta
Yifei Huang — UTokyo
Kumar Ashutosh — Meta
Toby Perrett — N/A
Bolin Lai — N/A
Nandho — Meta

Talks (9)

00:00:00 — Dima Damen: EgoVis 2023/2024 Distinguished Paper Awards Introduction
- Introduction to the EgoVis Distinguished Paper Awards, outlining the selection criteria, process, and board members.
00:04:09 — Takehiko Ohkawa: AssemblyHands: Towards Egocentric Activity Understanding via 3D Hand Pose Estimation
- A paper on 3D hand pose estimation from egocentric views, using a multi-camera setup and volumetric triangulation for accurate annotation, which has led to follow-up workshops and studies.
00:07:15 — Yale Song: Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities
- This paper addresses the challenge of annotating long-form, unscripted procedural activities in Ego4D by developing a bottom-up, iterative taxonomy refinement approach for dense, goal-oriented annotations.
00:11:20 — Yifei Huang: EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
- EgoExoLearn introduces a dataset and methodology for bridging egocentric and exocentric views of procedural activities, enabling machines to learn tasks by observing demonstrations and transferring that knowledge to their own viewpoint.
00:13:40 — Kumar Ashutosh: HierVL: Learning Hierarchical Video-Language Embeddings
- HierVL proposes learning hierarchical video-language embeddings to go beyond instantaneous matching, capturing the overall intent of activities and improving downstream performance in both short-term and long-term understanding.
00:15:12 — Toby Perrett: It’s Just Another Day: Unique Video Captioning by Discriminative Prompting
- This paper explores discriminative prompting for unique video captioning, emphasizing the importance of models being able to express uncertainty and acquire additional information when needed, and its potential for large-scale search in egocentric memories.
00:17:19 — Bolin Lai: LEGO: Learning EGocentric Action Frame Generation via Visual Instruction Tuning
- LEGO focuses on generating egocentric action frames through visual instruction tuning, aiming to apply state-of-the-art generative models to real-world utility and assist people in their daily lives.
00:19:48 — Nandho: VideoRecap: Recursive Captioning of Hour-Long Videos
- VideoRecap tackles the challenge of recursively captioning hour-long egocentric videos, a novel idea at the time that has since become more common in general video understanding tasks like question answering and summarization.
00:22:25 — Dima Damen: EgoVis 2023/2024 Distinguished Paper Awards Conclusion
- Concluding remarks for the EgoVis Distinguished Paper Awards, thanking board members, participants, and acknowledging the high quality of submissions.

Notes

Open for commentary — connections to other work, critiques, follow-up reading.