Second Egocentric Vision (EgoVis) Workshop, EPIC-KITCHENS & HD-EPIC Challenges
Event: CVPR Nashville · Duration: 23 min · ▶ Watch on YouTube
Speakers
- Michael Wray — University of Bristol
- Agnese Taluzzi — Politecnico di Milano & EssilorLuxottica
- Davide Gesualdi — Politecnico di Milano & EssilorLuxottica
- Sicheng Yang — Imperial College London
- Yukai Huang — Imperial College London
Talks (3)
- 00:00:03 — Michael Wray: Introduction to HD-EPIC VQA Challenge
- An overview of the HD-EPIC dataset, its annotations (recipes, fine-grained actions, moving objects, digital twin), and the VQA benchmark results.
- 01:27:50 — Agnese Taluzzi & Davide Gesualdi: From Pixels to Graphs: using Scene and Knowledge Graphs for HD-EPIC VQA Challenge
- A graph-based neurosymbolic AI approach leveraging SceneNet (video-grounded) and KnowledgeNet (external knowledge) to improve multimodal LLM reasoning for egocentric VQA.
- 01:42:00 — Sicheng Yang & Yukai Huang: Optimizing Multimodal LLMs for Egocentric Video Understanding: A Solution for the HD-EPIC VQA Challenge
- A solution for HD-EPIC VQA using pre-processing, fine-tuning of an open-source MLLM (Qwen2.5-VL-7B), a Temporal Chain-of-Thought (T-CoT) method, and post-processing with prompt ensembling.
Notes
Open for commentary — connections to other work, critiques, follow-up reading.