Second Egocentric Vision (EgoVis) Workshop, EPIC-KITCHENS & HD-EPIC Challenges

Event: CVPR Nashville · Duration: 23 min · ▶ Watch on YouTube

Speakers

  • Michael Wray — University of Bristol
  • Agnese Taluzzi — Politecnico di Milano & EssilorLuxottica
  • Davide Gesualdi — Politecnico di Milano & EssilorLuxottica
  • Sicheng Yang — Imperial College London
  • Yukai Huang — Imperial College London

Talks (3)

  • 00:00:03Michael Wray: Introduction to HD-EPIC VQA Challenge
    • An overview of the HD-EPIC dataset, its annotations (recipes, fine-grained actions, moving objects, digital twin), and the VQA benchmark results.
  • 01:27:50Agnese Taluzzi & Davide Gesualdi: From Pixels to Graphs: using Scene and Knowledge Graphs for HD-EPIC VQA Challenge
    • A graph-based neurosymbolic AI approach leveraging SceneNet (video-grounded) and KnowledgeNet (external knowledge) to improve multimodal LLM reasoning for egocentric VQA.
  • 01:42:00Sicheng Yang & Yukai Huang: Optimizing Multimodal LLMs for Egocentric Video Understanding: A Solution for the HD-EPIC VQA Challenge
    • A solution for HD-EPIC VQA using pre-processing, fine-tuning of an open-source MLLM (Qwen2.5-VL-7B), a Temporal Chain-of-Thought (T-CoT) method, and post-processing with prompt ensembling.

Notes

Open for commentary — connections to other work, critiques, follow-up reading.