4th Multimodal Algorithmic Reasoning Workshop

Event: CVPR 2025 Workshop on Multimodal Algorithmic Reasoning · Duration: 4 min · ▶ Watch on YouTube

Abstract

The 4th Multimodal Algorithmic Reasoning Workshop at CVPR 2025 aims to explore reasoning approaches for solving multimodal commonsense problems by deriving algorithms. It highlights the current limitations of large vision-and-language models (VLMs) on tasks requiring algorithmic reasoning, as demonstrated by the SMART-101 and SMART-840 datasets. The workshop will analyze AI progress from the perspectives of large multimodal foundation models, novel machine learning approaches, and cognitive models of intelligence. The goal is to identify current achievements and pinpoint missing components necessary to advance AI towards genuine algorithmic reasoning capabilities.

Speakers

Anoop Cherian — Mitsubishi Electric Research Labs
Kuan-Chuan Peng — Mitsubishi Electric Research Labs
Suhas Lohit — Mitsubishi Electric Research Labs
Tim K. Marks — Mitsubishi Electric Research Labs
Honglu Zhou — Salesforce AI Research
Le Xue — Salesforce AI Research
Kevin Smith — Massachusetts Institute of Technology
Joshua B. Tenenbaum — Massachusetts Institute of Technology
Cordelia Schmid — Inria / Google
Heng Ji — UIUC
Brenden Lake — NYU
Rishabh Agarwal — Meta / McGill University

Talks (2)

00:00 — Anoop Cherian: Introduction to the 4th Multimodal Algorithmic Reasoning Workshop
- Introduction to the workshop, its goals, and an overview of multimodal algorithmic reasoning challenges and the program structure.
03:11 — Dr. Cordelia Schmid: Multi-stage reasoning for video understanding and scene generation
- Introduction of Dr. Cordelia Schmid, a pioneer in computer vision, who will deliver the opening keynote on multi-stage reasoning for video understanding and scene generation.

Key Takeaways

Multimodal Algorithmic Reasoning focuses on enabling AI agents to derive algorithms for solving multimodal commonsense problems.
Current large vision-and-language models (VLMs) still struggle with these problems, often performing no better than random guess, and exhibit dataset bias.
The workshop aims to analyze AI progress from multiple perspectives, including foundation models, novel ML, and cognitive models, to identify pathways for advancing AI.
The program includes four keynotes, two spotlight paper sessions, and a poster session presenting 19 accepted papers.

Methods / Models / Datasets Mentioned

Neural Algorithmic Reasoning
SMART-101
SMART-840
GPT-4o
GPT-4v
Gemini-Pro 1.5
Claude-3 Sonnet
InternLM-XComposer2
InternVL-Chat
LLaVA-NEXT (34B)
Are Deep Neural Networks SMARter than Second Graders?
Evaluating Large Vision-and-Language Models on Children's Mathematical Olympiads
Multi-stage reasoning

Topics

Multimodal Algorithmic Reasoning · Neural Algorithmic Reasoning · Vision-and-language mathematical reasoning · Foundational models · Visual program synthesis · Causal chains for video understanding · Multimodal physical reasoning for robotics · Cognitive models of intelligence · AI progress analysis · Dataset bias in VLMs

Notes

Open for commentary — connections to other work, critiques, follow-up reading.