3D Generative AI: Efficient, high-def & controllable

Event: CVPR WORKSHOP ON 3D GENERATIVE AI, JUNE 2024 · Duration: 371 min · ▶ Watch on YouTube

Abstract

This workshop explores the latest advancements in 3D generative AI, focusing on achieving efficient, high-definition, and controllable generation of 3D assets, scenes, and even entire worlds. Speakers delve into innovative approaches for text-to-3D mesh generation, object insertion in neural 3D scenes, and the creation of common-sense indoor environments using scene graphs and diffusion models. Key themes include leveraging compositional structures, addressing challenges in 3D consistency and localized editing, and developing robust methods for human motion estimation and scene reconstruction from various inputs, including monocular videos and egocentric views. The discussions highlight the potential of diffusion models and novel architectural designs like Diffusion Transformers to push the boundaries of 3D content creation and perception.

Speakers

  • Adam Kortylewski — University of Freiburg & Max-Planck-Institute for Informatics
  • Lingjie Liu — UPenn
  • Michael Niemeyer — Google
  • Michael Oechsle — Google
  • Christian Theobalt — MPI-INF
  • Alan Yuille — JHU
  • Fangneng Zhan — MPI-INF
  • Gianluca Corrado — Wayve
  • Siyu Tang — ETH Zürich
  • Saining Xie — New York University
  • Jiajun Wu — Stanford University
  • Katerina Fragkiadaki — Carnegie Mellon University
  • Andrea Vedaldi — University of Oxford
  • Federico Tombari — Google, TUM

Talks (9)

  • 00:00:00 — Adam Kortylewski: Welcome to the 2nd Workshop on Generative Models for Computer Vision
    • Adam Kortylewski welcomes attendees to the 2nd Workshop on Generative Models for Computer Vision, highlighting the success of the previous year’s event and the significant progress in generative models, particularly in 3D synthesis and image generation.
  • 00:00:00 — Gianluca Corrado: Embodied AI in Autonomous Driving
    • This talk discusses Wayve’s approach to autonomous driving using end-to-end embodied AI, highlighting the benefits of computational homogeneity, hardware agnosticism, agile development, and superior performance, particularly in handling long-tail scenarios and generalizing across different vehicle types.
  • 00:54:15Siyu Tang: Generative Models for Human Motion Estimation
    • This talk introduces a marker-based representation for human motion, which is then used to train an autoencoder to learn motion priors. These priors are subsequently used to reconstruct human motion from noisy or incomplete observations, demonstrating improved robustness and naturalness compared to previous methods.
  • 01:53:15Saining Xie: Diffusion Transformers and Beyond 🚀 and why you should stop worrying and love DiT
    • This talk introduces Diffusion Transformers (DiT) as a new class of diffusion models, emphasizing their simple, scalable architecture and superior performance compared to traditional U-Nets, particularly in image generation tasks and when scaled up for text-to-image synthesis.
  • 02:35:55Alan Yuille: Approximate Analysis by Synthesis
    • This talk advocates for “analysis by synthesis” as a framework for computer vision, where understanding an image involves generating it from a 3D model. It emphasizes the importance of 3D compositional generative networks (3D-CGNs) that learn object intrinsics and can generalize to novel data, even under occlusions, outperforming traditional deep networks in out-of-distribution tasks.
  • 03:13:57Jiajun Wu: Generating Objects and Scenes and Worlds and what it means for computer vision
    • This talk explores leveraging compositional structures for 3D generation, moving from single objects to complex scenes and entire worlds. It highlights the use of scene graphs to represent object relationships and attributes, enabling controlled generation and manipulation of 3D environments, with a focus on improving realism and consistency.
  • 03:53:51Katerina Fragkiadaki: Image and Video Perception with Generative Feedback
    • This talk introduces a generative feedback approach to perception, where discriminative models are adapted at test time using generative models. It demonstrates how this “Diffusion-TTA” method significantly boosts performance in image classification and segmentation tasks, particularly in out-of-distribution and online settings, by leveraging diffusion models to refine predictions and improve consistency.
  • 04:28:59Andrea Vedaldi: 3D Generative AI Efficient, high-def & controllable
    • This talk presents a comprehensive approach to 3D generative AI, focusing on efficiency, high-definition, and controllability. It introduces “Splatter Image” for fast single-view 3D reconstruction, “Free3D” for consistent multi-view generation, and “IM-3D” for high-quality texture generation, all leveraging diffusion models and emphasizing the importance of 3D-aware representations and efficient training.
  • 05:31:40Federico Tombari: Generating 3D assets with Diffusion
    • This talk introduces a novel approach to generating 3D assets using diffusion models, focusing on creating realistic and detailed textures for 3D meshes. It highlights the use of a scene graph representation to condition the diffusion process, enabling the generation of diverse and semantically consistent 3D scenes with controllable object attributes and relationships.

Methods / Models / Datasets Mentioned

  • Lingo-1
  • Lingo-2
  • GAIA (2023)
  • PRISM-1
  • WayveScenes101
  • Lemo
  • AMASS
  • Prox
  • EgoBody
  • EgoHMR
  • RoHM
  • DiT
  • U-Net
  • GPT-3
  • Chinchilla
  • Imagen
  • LDM
  • PixArt-α
  • Sora
  • LRM
  • GigaGAN
  • SDV1.5
  • DALL-E 2
  • T5
  • CLIP
  • Instruct-NeRF2NeRF
  • PIXART-δ
  • PIXART-Σ
  • SIT
  • DDPM
  • NeRF
  • Neus
  • VolSDF
  • UniSurf
  • Pascal3D+
  • OOD-CV
  • ResNet50
  • ConvNext
  • ViT-b-16
  • NOVUM
  • DreamFusion
  • TextMesh
  • CommonScenes
  • ATISS
  • MeshLRM
  • Diffusion-TTA
  • DreamScene4D
  • Zero-1-to-3
  • MV-Dream
  • Consistent4D
  • BPI
  • ZeroNVS
  • Splatter Image
  • Free3D
  • IM-3D
  • VQGAN
  • PCA

Topics

3D Generative AI · Diffusion Models · Text-to-3D Generation · Scene Graphs · Embodied AI · Human Motion Estimation · Neural Distance Fields · Novel View Synthesis · Scalability in 3D Generation · Test-Time Adaptation


Notes

Open for commentary — connections to other work, critiques, follow-up reading.