Disentanglement and Compositionality in Artificial Intelligence

Event: CVPR 2024 Tutorial · Duration: 162 min · ▶ Watch on YouTube

Abstract

This tutorial explores the fundamental concepts of disentanglement and compositionality in Artificial Intelligence, particularly within the domain of Computer Vision. It delves into how AI systems can achieve human-like understanding and creation by breaking down complex visual information into independent, interpretable factors and composing them to generate new concepts. The tutorial covers recent advancements in disentangled representation learning, model-based visual concept learning, equivariant representations, and compositional understanding for Artificial General General Intelligence (AGI), highlighting their applications and future directions.

Speakers

Xin Jin — Eastern Institute of Technology, Ningbo, China
Tao Yang — XI’AN JIAOTONG UNIVERSITY
Yue Song — National University of Singapore
Xingyi Yang — National University of Singapore
Wenjun (Kevin) Zeng — XI’AN JIAOTONG UNIVERSITY
Nicu Sebe — University of Trento
Xinchao Wang — National University of Singapore
Shuicheng Yan — National University of Singapore

Talks (4)

00:34:00 — Xin Jin: Disentanglement and Compositionality in Computer Vision
- This talk introduces disentanglement and compositionality as key concepts for AI to achieve human-like understanding and creation in computer vision, emphasizing their importance for building robust and interpretable AGI systems.
01:04:35 — Wenjun (Kevin) Zeng, Tao Yang: Disentangled Model Based Visual Concept Learning
- This talk delves into disentangled model-based visual concept learning, presenting group theory-based VAEs, pretrained generative models, and transformer-based models, and demonstrating their effectiveness in learning disentangled representations for various visual concepts.
01:42:49 — Nicu Sebe, Yue Song: Equivariant and Disentangled Representation Learning
- This talk explores the concept of equivariance in neural networks and its connection to disentangled representation learning, presenting a flow-factorized representation learning framework for unsupervised disentanglement.
02:09:00 — Xingyi Yang, Xinchao Wang, Shuicheng Yan: Disentanglement and Composition for AGI
- This talk defines AGI in the context of disentanglement and compositionality, exploring compositional models, strategies, and data, and how these principles can be leveraged to build more robust, generalizable, and human-like AI systems.

Key Takeaways

Disentangled representation learning is a fundamental task, crucial for General AI, and offers benefits across numerous applications.
Compositionality and hierarchy are essential foundations for achieving diversity and scalability in AI, particularly relevant in the era of Large Models.
Current research faces bottlenecks and opportunities in demystifying disentanglement within generative models, understanding the relationship between disentangled generation and composition, and exploring Lie groups for disentanglement.
Practical applications include general domain generalization through content disentanglement and compositionality, and video compositional generation by disentangling factors from single images.

Methods / Models / Datasets Mentioned

ChatGPT
GPT-4
Sora
Imagen
ControlNet
StyleDrop
AIGC
AlphaGo
Perceptron
Hopfield neural network
BP algorithm
Deep Learning
Autoencoders
Variational Autoencoders (VAEs)
Beta-VAE
DIP-VAE
FactorVAE
Beta-TCVAE
JointVAE
RF-VAE
InfoGAN
IB-GAN
InfoGAN-CR
PS-SC GAN
DNA-GAN
MAP-JVR
DRNET
DR-GAN
DisDiff
FineGAN
Capsule Nets
Beta-Capsnet
CausalVAE
CL-Dis
Group Theory Based Disentangled VAEs
Pretrained Generative Model Based Disentangled Model
Transformer Based Disentangled Model
LD (Latent Diffusion)
GS (Generative Score)
DisCo (Disentangled Contrastive Learning)
VQ-VAE
CLIP
LLM
VLM
DALL-E 2
Stable Diffusion
Energy-based models
Contrastive Energy Model
Score Denoising
Diffusion Model
Product of Expert
Classifier-Free Guidance
LLaVA
MiniGPT-4
Unified-IO V1
Unified-IO V2
Neural Module Networks
Equivariant Neural Networks
Flow Factorized Representation Learning
Sparse Transformation Analysis
MNIST
Shapes3D
Issac3D
Falco3D
Compositional LLM
LLM Agent
Retrieval Augmented Generation (RAG)
Segment Anything (SA)
HuggingGPT
Energy-based models
Contrastive Energy Model
Score Denoising
Diffusion Model
Product of Expert
LLaVA (NeurIPS 2023)
MiniGPT-4
Unified-IO V1 (ICLR 2023)
Unified-IO V2 (CVPR 2024)

Topics

Disentanglement · Compositionality · Artificial General Intelligence (AGI) · Computer Vision · Representation Learning · Generative Models · Equivariance · Human-like AI · Interpretability · Generalization

Notes

Open for commentary — connections to other work, critiques, follow-up reading.