Second Egocentric Vision (EgoVis) Workshop

Event: EgoVis Distinguished Papers Award · Duration: 23 min · ▶ Watch on YouTube

Speakers

  • Dima Damen — University of Bristol and Google DeepMind
  • Takehiko Ohkawa — Meta
  • Yale Song — Meta
  • Yifei Huang — UTokyo
  • Kumar Ashutosh — Meta
  • Toby Perrett — N/A
  • Bolin Lai — N/A
  • Nandho — Meta

Talks (9)

  • 00:00:00 — Dima Damen: EgoVis 2023/2024 Distinguished Paper Awards Introduction
    • Introduction to the EgoVis Distinguished Paper Awards, outlining the selection criteria, process, and board members.
  • 00:04:09Takehiko Ohkawa: AssemblyHands: Towards Egocentric Activity Understanding via 3D Hand Pose Estimation
    • A paper on 3D hand pose estimation from egocentric views, using a multi-camera setup and volumetric triangulation for accurate annotation, which has led to follow-up workshops and studies.
  • 00:07:15Yale Song: Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities
    • This paper addresses the challenge of annotating long-form, unscripted procedural activities in Ego4D by developing a bottom-up, iterative taxonomy refinement approach for dense, goal-oriented annotations.
  • 00:11:20Yifei Huang: EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
    • EgoExoLearn introduces a dataset and methodology for bridging egocentric and exocentric views of procedural activities, enabling machines to learn tasks by observing demonstrations and transferring that knowledge to their own viewpoint.
  • 00:13:40Kumar Ashutosh: HierVL: Learning Hierarchical Video-Language Embeddings
    • HierVL proposes learning hierarchical video-language embeddings to go beyond instantaneous matching, capturing the overall intent of activities and improving downstream performance in both short-term and long-term understanding.
  • 00:15:12Toby Perrett: It’s Just Another Day: Unique Video Captioning by Discriminative Prompting
    • This paper explores discriminative prompting for unique video captioning, emphasizing the importance of models being able to express uncertainty and acquire additional information when needed, and its potential for large-scale search in egocentric memories.
  • 00:17:19Bolin Lai: LEGO: Learning EGocentric Action Frame Generation via Visual Instruction Tuning
    • LEGO focuses on generating egocentric action frames through visual instruction tuning, aiming to apply state-of-the-art generative models to real-world utility and assist people in their daily lives.
  • 00:19:48Nandho: VideoRecap: Recursive Captioning of Hour-Long Videos
    • VideoRecap tackles the challenge of recursively captioning hour-long egocentric videos, a novel idea at the time that has since become more common in general video understanding tasks like question answering and summarization.
  • 00:22:25Dima Damen: EgoVis 2023/2024 Distinguished Paper Awards Conclusion
    • Concluding remarks for the EgoVis Distinguished Paper Awards, thanking board members, participants, and acknowledging the high quality of submissions.

Notes

Open for commentary — connections to other work, critiques, follow-up reading.