Disentanglement and Compositionality in Artificial Intelligence
Event: CVPR 2024 Tutorial · Duration: 162 min · ▶ Watch on YouTube
Abstract
This tutorial explores the fundamental concepts of disentanglement and compositionality in Artificial Intelligence, particularly within the domain of Computer Vision. It delves into how AI systems can achieve human-like understanding and creation by breaking down complex visual information into independent, interpretable factors and composing them to generate new concepts. The tutorial covers recent advancements in disentangled representation learning, model-based visual concept learning, equivariant representations, and compositional understanding for Artificial General General Intelligence (AGI), highlighting their applications and future directions.
Speakers
- Xin Jin — Eastern Institute of Technology, Ningbo, China
- Tao Yang — XI’AN JIAOTONG UNIVERSITY
- Yue Song — National University of Singapore
- Xingyi Yang — National University of Singapore
- Wenjun (Kevin) Zeng — XI’AN JIAOTONG UNIVERSITY
- Nicu Sebe — University of Trento
- Xinchao Wang — National University of Singapore
- Shuicheng Yan — National University of Singapore
Talks (4)
- 00:34:00 — Xin Jin: Disentanglement and Compositionality in Computer Vision
- This talk introduces disentanglement and compositionality as key concepts for AI to achieve human-like understanding and creation in computer vision, emphasizing their importance for building robust and interpretable AGI systems.
- 01:04:35 — Wenjun (Kevin) Zeng, Tao Yang: Disentangled Model Based Visual Concept Learning
- This talk delves into disentangled model-based visual concept learning, presenting group theory-based VAEs, pretrained generative models, and transformer-based models, and demonstrating their effectiveness in learning disentangled representations for various visual concepts.
- 01:42:49 — Nicu Sebe, Yue Song: Equivariant and Disentangled Representation Learning
- This talk explores the concept of equivariance in neural networks and its connection to disentangled representation learning, presenting a flow-factorized representation learning framework for unsupervised disentanglement.
- 02:09:00 — Xingyi Yang, Xinchao Wang, Shuicheng Yan: Disentanglement and Composition for AGI
- This talk defines AGI in the context of disentanglement and compositionality, exploring compositional models, strategies, and data, and how these principles can be leveraged to build more robust, generalizable, and human-like AI systems.
Key Takeaways
- Disentangled representation learning is a fundamental task, crucial for General AI, and offers benefits across numerous applications.
- Compositionality and hierarchy are essential foundations for achieving diversity and scalability in AI, particularly relevant in the era of Large Models.
- Current research faces bottlenecks and opportunities in demystifying disentanglement within generative models, understanding the relationship between disentangled generation and composition, and exploring Lie groups for disentanglement.
- Practical applications include general domain generalization through content disentanglement and compositionality, and video compositional generation by disentangling factors from single images.
Methods / Models / Datasets Mentioned
ChatGPTGPT-4SoraImagenControlNetStyleDropAIGCAlphaGoPerceptronHopfield neural networkBP algorithmDeep LearningAutoencodersVariational Autoencoders (VAEs)Beta-VAEDIP-VAEFactorVAEBeta-TCVAEJointVAERF-VAEInfoGANIB-GANInfoGAN-CRPS-SC GANDNA-GANMAP-JVRDRNETDR-GANDisDiffFineGANCapsule NetsBeta-CapsnetCausalVAECL-DisGroup Theory Based Disentangled VAEsPretrained Generative Model Based Disentangled ModelTransformer Based Disentangled ModelLD (Latent Diffusion)GS (Generative Score)DisCo (Disentangled Contrastive Learning)VQ-VAECLIPLLMVLMDALL-E 2Stable DiffusionEnergy-based modelsContrastive Energy ModelScore DenoisingDiffusion ModelProduct of ExpertClassifier-Free GuidanceLLaVAMiniGPT-4Unified-IO V1Unified-IO V2Neural Module NetworksEquivariant Neural NetworksFlow Factorized Representation LearningSparse Transformation AnalysisMNISTShapes3DIssac3DFalco3DCompositional LLMLLM AgentRetrieval Augmented Generation (RAG)Segment Anything (SA)HuggingGPTEnergy-based modelsContrastive Energy ModelScore DenoisingDiffusion ModelProduct of ExpertLLaVA (NeurIPS 2023)MiniGPT-4Unified-IO V1 (ICLR 2023)Unified-IO V2 (CVPR 2024)
Topics
Disentanglement · Compositionality · Artificial General Intelligence (AGI) · Computer Vision · Representation Learning · Generative Models · Equivariance · Human-like AI · Interpretability · Generalization
Notes
Open for commentary — connections to other work, critiques, follow-up reading.