Disentanglement and Compositionality in Artificial Intelligence

Event: CVPR 2024 Tutorial · Duration: 162 min · ▶ Watch on YouTube

Abstract

This tutorial explores the fundamental concepts of disentanglement and compositionality in Artificial Intelligence, particularly within the domain of Computer Vision. It delves into how AI systems can achieve human-like understanding and creation by breaking down complex visual information into independent, interpretable factors and composing them to generate new concepts. The tutorial covers recent advancements in disentangled representation learning, model-based visual concept learning, equivariant representations, and compositional understanding for Artificial General General Intelligence (AGI), highlighting their applications and future directions.

Speakers

  • Xin Jin — Eastern Institute of Technology, Ningbo, China
  • Tao Yang — XI’AN JIAOTONG UNIVERSITY
  • Yue Song — National University of Singapore
  • Xingyi Yang — National University of Singapore
  • Wenjun (Kevin) Zeng — XI’AN JIAOTONG UNIVERSITY
  • Nicu Sebe — University of Trento
  • Xinchao Wang — National University of Singapore
  • Shuicheng Yan — National University of Singapore

Talks (4)

  • 00:34:00Xin Jin: Disentanglement and Compositionality in Computer Vision
    • This talk introduces disentanglement and compositionality as key concepts for AI to achieve human-like understanding and creation in computer vision, emphasizing their importance for building robust and interpretable AGI systems.
  • 01:04:35Wenjun (Kevin) Zeng, Tao Yang: Disentangled Model Based Visual Concept Learning
    • This talk delves into disentangled model-based visual concept learning, presenting group theory-based VAEs, pretrained generative models, and transformer-based models, and demonstrating their effectiveness in learning disentangled representations for various visual concepts.
  • 01:42:49Nicu Sebe, Yue Song: Equivariant and Disentangled Representation Learning
    • This talk explores the concept of equivariance in neural networks and its connection to disentangled representation learning, presenting a flow-factorized representation learning framework for unsupervised disentanglement.
  • 02:09:00Xingyi Yang, Xinchao Wang, Shuicheng Yan: Disentanglement and Composition for AGI
    • This talk defines AGI in the context of disentanglement and compositionality, exploring compositional models, strategies, and data, and how these principles can be leveraged to build more robust, generalizable, and human-like AI systems.

Key Takeaways

  • Disentangled representation learning is a fundamental task, crucial for General AI, and offers benefits across numerous applications.
  • Compositionality and hierarchy are essential foundations for achieving diversity and scalability in AI, particularly relevant in the era of Large Models.
  • Current research faces bottlenecks and opportunities in demystifying disentanglement within generative models, understanding the relationship between disentangled generation and composition, and exploring Lie groups for disentanglement.
  • Practical applications include general domain generalization through content disentanglement and compositionality, and video compositional generation by disentangling factors from single images.

Methods / Models / Datasets Mentioned

  • ChatGPT
  • GPT-4
  • Sora
  • Imagen
  • ControlNet
  • StyleDrop
  • AIGC
  • AlphaGo
  • Perceptron
  • Hopfield neural network
  • BP algorithm
  • Deep Learning
  • Autoencoders
  • Variational Autoencoders (VAEs)
  • Beta-VAE
  • DIP-VAE
  • FactorVAE
  • Beta-TCVAE
  • JointVAE
  • RF-VAE
  • InfoGAN
  • IB-GAN
  • InfoGAN-CR
  • PS-SC GAN
  • DNA-GAN
  • MAP-JVR
  • DRNET
  • DR-GAN
  • DisDiff
  • FineGAN
  • Capsule Nets
  • Beta-Capsnet
  • CausalVAE
  • CL-Dis
  • Group Theory Based Disentangled VAEs
  • Pretrained Generative Model Based Disentangled Model
  • Transformer Based Disentangled Model
  • LD (Latent Diffusion)
  • GS (Generative Score)
  • DisCo (Disentangled Contrastive Learning)
  • VQ-VAE
  • CLIP
  • LLM
  • VLM
  • DALL-E 2
  • Stable Diffusion
  • Energy-based models
  • Contrastive Energy Model
  • Score Denoising
  • Diffusion Model
  • Product of Expert
  • Classifier-Free Guidance
  • LLaVA
  • MiniGPT-4
  • Unified-IO V1
  • Unified-IO V2
  • Neural Module Networks
  • Equivariant Neural Networks
  • Flow Factorized Representation Learning
  • Sparse Transformation Analysis
  • MNIST
  • Shapes3D
  • Issac3D
  • Falco3D
  • Compositional LLM
  • LLM Agent
  • Retrieval Augmented Generation (RAG)
  • Segment Anything (SA)
  • HuggingGPT
  • Energy-based models
  • Contrastive Energy Model
  • Score Denoising
  • Diffusion Model
  • Product of Expert
  • LLaVA (NeurIPS 2023)
  • MiniGPT-4
  • Unified-IO V1 (ICLR 2023)
  • Unified-IO V2 (CVPR 2024)

Topics

Disentanglement · Compositionality · Artificial General Intelligence (AGI) · Computer Vision · Representation Learning · Generative Models · Equivariance · Human-like AI · Interpretability · Generalization


Notes

Open for commentary — connections to other work, critiques, follow-up reading.