Second Egocentric Vision (EgoVis) Workshop, EPIC-KITCHENS & HD-EPIC Challenges

Event: CVPR Nashville · Duration: 23 min · ▶ Watch on YouTube

Speakers

00:00:03 — Michael Wray: Introduction to HD-EPIC VQA Challenge
- An overview of the HD-EPIC dataset, its annotations (recipes, fine-grained actions, moving objects, digital twin), and the VQA benchmark results.
01:27:50 — Agnese Taluzzi & Davide Gesualdi: From Pixels to Graphs: using Scene and Knowledge Graphs for HD-EPIC VQA Challenge
- A graph-based neurosymbolic AI approach leveraging SceneNet (video-grounded) and KnowledgeNet (external knowledge) to improve multimodal LLM reasoning for egocentric VQA.
01:42:00 — Sicheng Yang & Yukai Huang: Optimizing Multimodal LLMs for Egocentric Video Understanding: A Solution for the HD-EPIC VQA Challenge
- A solution for HD-EPIC VQA using pre-processing, fine-tuning of an open-source MLLM (Qwen2.5-VL-7B), a Temporal Chain-of-Thought (T-CoT) method, and post-processing with prompt ensembling.

Open for commentary — connections to other work, critiques, follow-up reading.