4th Multimodal Algorithmic Reasoning Workshop
Event: CVPR 2025 Workshop on Multimodal Algorithmic Reasoning · Duration: 4 min · ▶ Watch on YouTube
Abstract
The 4th Multimodal Algorithmic Reasoning Workshop at CVPR 2025 aims to explore reasoning approaches for solving multimodal commonsense problems by deriving algorithms. It highlights the current limitations of large vision-and-language models (VLMs) on tasks requiring algorithmic reasoning, as demonstrated by the SMART-101 and SMART-840 datasets. The workshop will analyze AI progress from the perspectives of large multimodal foundation models, novel machine learning approaches, and cognitive models of intelligence. The goal is to identify current achievements and pinpoint missing components necessary to advance AI towards genuine algorithmic reasoning capabilities.
Speakers
- Anoop Cherian — Mitsubishi Electric Research Labs
- Kuan-Chuan Peng — Mitsubishi Electric Research Labs
- Suhas Lohit — Mitsubishi Electric Research Labs
- Tim K. Marks — Mitsubishi Electric Research Labs
- Honglu Zhou — Salesforce AI Research
- Le Xue — Salesforce AI Research
- Kevin Smith — Massachusetts Institute of Technology
- Joshua B. Tenenbaum — Massachusetts Institute of Technology
- Cordelia Schmid — Inria / Google
- Heng Ji — UIUC
- Brenden Lake — NYU
- Rishabh Agarwal — Meta / McGill University
Talks (2)
- 00:00 — Anoop Cherian: Introduction to the 4th Multimodal Algorithmic Reasoning Workshop
- Introduction to the workshop, its goals, and an overview of multimodal algorithmic reasoning challenges and the program structure.
- 03:11 — Dr. Cordelia Schmid: Multi-stage reasoning for video understanding and scene generation
- Introduction of Dr. Cordelia Schmid, a pioneer in computer vision, who will deliver the opening keynote on multi-stage reasoning for video understanding and scene generation.
Key Takeaways
- Multimodal Algorithmic Reasoning focuses on enabling AI agents to derive algorithms for solving multimodal commonsense problems.
- Current large vision-and-language models (VLMs) still struggle with these problems, often performing no better than random guess, and exhibit dataset bias.
- The workshop aims to analyze AI progress from multiple perspectives, including foundation models, novel ML, and cognitive models, to identify pathways for advancing AI.
- The program includes four keynotes, two spotlight paper sessions, and a poster session presenting 19 accepted papers.
Methods / Models / Datasets Mentioned
Neural Algorithmic ReasoningSMART-101SMART-840GPT-4oGPT-4vGemini-Pro 1.5Claude-3 SonnetInternLM-XComposer2InternVL-ChatLLaVA-NEXT (34B)Are Deep Neural Networks SMARter than Second Graders?Evaluating Large Vision-and-Language Models on Children's Mathematical OlympiadsMulti-stage reasoning
Topics
Multimodal Algorithmic Reasoning · Neural Algorithmic Reasoning · Vision-and-language mathematical reasoning · Foundational models · Visual program synthesis · Causal chains for video understanding · Multimodal physical reasoning for robotics · Cognitive models of intelligence · AI progress analysis · Dataset bias in VLMs
Notes
Open for commentary — connections to other work, critiques, follow-up reading.