3D/4D Generation and Modeling with Generative Priors

Event: CVPR 2024 · Duration: 91 min · ▶ Watch on YouTube

Abstract

This tutorial explores 3D and 4D generation and modeling using generative priors. It covers the evolution of 3D and 4D content creation, from early methods relying on limited 3D data to recent advancements leveraging large-scale 2D diffusion models and large language models. Key topics include optimization-based 3D generation, feed-forward 3D reconstruction, 3D scene synthesis, and the emerging field of 4D generation and reconstruction. The tutorial highlights challenges such as data scarcity, computational cost, and achieving realism and consistency, while showcasing state-of-the-art techniques and future research directions.

Speakers

  • Hsin-Ying Lee — Snap Research
  • Peiye Zhuang — Snap Research
  • Chaoyang Wang — Snap Research

Talks (6)

  • 00:00:00 — Hsin-Ying Lee: Introduction and Logistics
    • Overview of the tutorial’s scope, the mission of 3D/4D generation, and the evolution of 3D and 4D modeling techniques.
  • 00:04:55Hsin-Ying Lee: 3D Generation w/o Large-Scale 2D Priors
    • Discussion on 3D generative models trained using 2D and 3D data, focusing on NeRF and StyleGAN architectures and their early integration, and challenges with limited 3D data.
  • 00:20:49Peiye Zhuang: Bridging 2D and 3D: From Optimization to Feed-forward
    • Exploration of 3D generation using 2D diffusion models, covering optimization-based methods like SDS, diversity in generation, photo-realism, and feed-forward 3D reconstruction techniques.
  • 00:45:03Hsin-Ying Lee: 3D Scene Generation
    • Examination of 3D scene generation methods, including those using 2D data, 3D data, and large language models (LLMs) as priors, with a focus on traversable and compositional scenes.
  • 01:02:06Chaoyang Wang: 4D Generation and Reconstruction
    • Discussion on 4D generation and reconstruction, including challenges with limited 4D data, reconstruction from monocular videos, and learning 4D generators with limited data using score distillation sampling.
  • 01:30:05Hsin-Ying Lee: Closing Remarks
    • Summary of the tutorial’s key topics and future directions in 4D generation and reconstruction.

Key Takeaways

  • 3D and 4D generation has evolved from direct 3D data training to leveraging large-scale 2D priors (diffusion models) for higher quality, albeit slower, results.
  • Optimization-based methods like SDS enable 3D generation from 2D priors, with ongoing research focusing on diversity, photorealism, and efficient feed-forward approaches.
  • 3D scene generation presents unique challenges related to complexity, traversability, and compositionality, with solutions exploring 2D/3D data, LLMs, and hybrid representations.
  • 4D generation and reconstruction is a nascent field, facing significant data scarcity, and is being tackled through reconstruction from monocular videos, 3D generative priors, and learning 4D generators with limited data.
  • Future directions emphasize feed-forward 4D reconstruction, building larger-scale 4D datasets, and developing advanced training algorithms for limited data to achieve faster, more realistic, and controllable 4D content.

Methods / Models / Datasets Mentioned

  • NeRF
  • StyleGAN
  • StyleGAN2
  • StyleGAN2-ADA
  • StyleGAN3
  • HoloGAN
  • GIRAFFE
  • pi-GAN
  • VolumeGAN
  • GRAM
  • StyleNeRF
  • EG3D
  • 3DAvatarGAN
  • 3DGP
  • PointFlow
  • SDfusion
  • SAGNet
  • MeshGPT
  • DeepCAD
  • Instant3D
  • LRM
  • DreamFusion
  • SDS
  • ProlificDreamer
  • HiFA
  • Wonder3D
  • DMVD3D
  • Cat3D
  • SyncDreamer
  • Direct3D
  • SceneWiz3D
  • DiscoScene
  • InfiniteNature-Zero
  • Persistent Nature
  • GSN
  • GAUDI
  • Text2Room
  • Text2NeRF
  • LucidDreamer
  • GraphDreamer
  • GALA3D
  • Human4DiT
  • STAG4D
  • MoSca
  • PhysDreamer
  • PhysGaussian
  • PAC-NeRF
  • GAvatar
  • PhysDiff
  • HVH
  • Sora
  • Objaverse-XL
  • ShapeNet
  • ImageNet
  • Coyo
  • Ego-Exo4D
  • Magic3D
  • MAV3D
  • 4Dfy
  • Dream-in-4D
  • 4Real

Topics

3D Generation · 4D Generation · Generative Priors · Neural Radiance Fields (NeRF) · StyleGAN · Diffusion Models · 3D Reconstruction · 3D Scene Generation · Multi-view Synthesis · Large Language Models (LLMs) · Gaussian Splatting · Human Motion Animation · Photorealism


Notes

Open for commentary — connections to other work, critiques, follow-up reading.