Concept Learning Across Domains and Modalities
Event: CVPR 2025 Workshop Session · Duration: 45 min · ▶ Watch on YouTube
Abstract
The presentation delves into the realm of concept learning, highlighting its increasing relevance with the advent of large language models. It introduces Neuro-Symbolic Concept Learners (NS-CL) as a framework for joint learning of concepts and semantic parsing, demonstrating its data efficiency and combinatorial generalization capabilities. The talk then extends this idea to various domains including dynamic scenes, 3D environments, human motion understanding, and robotic manipulation. Finally, it proposes Logic-Enhanced Foundation models (LEFT) as a unified framework that combines foundation models with differentiable first-order logic to achieve domain-independent reasoning and strong generalization across different modalities and tasks.
Speakers
- Jiajun Wu — Stanford University
- Anoosha Cherian
- Suraj Lohit
- Kevin Smith
Talks (1)
- 00:00:00 — Jiajun Wu: Concept Learning Across Domains and Modalities
- This talk explores the development of neuro-symbolic concept learners and Logic-Enhanced Foundation models (LEFT) to enable robust concept learning and reasoning across diverse visual and linguistic domains, emphasizing data efficiency, generalization, and interpretability.
Key Takeaways
- Neuro-symbolic approaches, which combine neural networks with symbolic reasoning, offer superior data efficiency, interpretability, and combinatorial generalization compared to purely end-to-end models.
- The LEFT framework provides a unified and flexible solution for concept learning and reasoning across diverse domains (2D, 3D, temporal, robotics) by integrating foundation models with differentiable first-order logic.
- LEFT demonstrates strong zero-shot and compositional generalization to novel tasks and achieves high data efficiency, outperforming prior methods in various complex reasoning benchmarks.
- The framework’s modular design allows for the integration of both differentiable neural network modules and non-differentiable off-the-shelf tools, offering adaptability to different domain-specific grounding requirements.
- The ability to learn domain-independent logical forms from natural language queries, combined with domain-specific grounding, is crucial for building robust and generalizable AI systems.
Methods / Models / Datasets Mentioned
CLEVRNS-VQAMACIEPFiLMSANViperGPTVisual ProgrammingOpenAI CodexNS-CLCLRVERNS-3DBABEL-QA DatasetNS-PoseProgramPortBUTD-DETRMVTSATTransReferLEFT (Logic-Enhanced Foundation models)Faster R-CNNPointNet++2s-AGCNDense CLIPFlamingoMotionCLIP
Topics
Concept Learning · Neuro-Symbolic AI · Multimodal Learning · Visual Question Answering (VQA) · Scene Understanding · Data Efficiency · Combinatorial Generalization · 3D Vision · Dynamic Scenes · Human Motion Understanding · Robotic Manipulation · First-Order Logic · Foundation Models · Differentiable Reasoning
Notes
Open for commentary — connections to other work, critiques, follow-up reading.