3D/4D Generation and Modeling with Generative Priors

Event: CVPR 2024 · Duration: 91 min · ▶ Watch on YouTube

Abstract

This tutorial explores 3D and 4D generation and modeling using generative priors. It covers the evolution of 3D and 4D content creation, from early methods relying on limited 3D data to recent advancements leveraging large-scale 2D diffusion models and large language models. Key topics include optimization-based 3D generation, feed-forward 3D reconstruction, 3D scene synthesis, and the emerging field of 4D generation and reconstruction. The tutorial highlights challenges such as data scarcity, computational cost, and achieving realism and consistency, while showcasing state-of-the-art techniques and future research directions.

Speakers

Hsin-Ying Lee — Snap Research
Peiye Zhuang — Snap Research
Chaoyang Wang — Snap Research

Talks (6)

00:00:00 — Hsin-Ying Lee: Introduction and Logistics
- Overview of the tutorial’s scope, the mission of 3D/4D generation, and the evolution of 3D and 4D modeling techniques.
00:04:55 — Hsin-Ying Lee: 3D Generation w/o Large-Scale 2D Priors
- Discussion on 3D generative models trained using 2D and 3D data, focusing on NeRF and StyleGAN architectures and their early integration, and challenges with limited 3D data.
00:20:49 — Peiye Zhuang: Bridging 2D and 3D: From Optimization to Feed-forward
- Exploration of 3D generation using 2D diffusion models, covering optimization-based methods like SDS, diversity in generation, photo-realism, and feed-forward 3D reconstruction techniques.
00:45:03 — Hsin-Ying Lee: 3D Scene Generation
- Examination of 3D scene generation methods, including those using 2D data, 3D data, and large language models (LLMs) as priors, with a focus on traversable and compositional scenes.
01:02:06 — Chaoyang Wang: 4D Generation and Reconstruction
- Discussion on 4D generation and reconstruction, including challenges with limited 4D data, reconstruction from monocular videos, and learning 4D generators with limited data using score distillation sampling.
01:30:05 — Hsin-Ying Lee: Closing Remarks
- Summary of the tutorial’s key topics and future directions in 4D generation and reconstruction.

Key Takeaways

3D and 4D generation has evolved from direct 3D data training to leveraging large-scale 2D priors (diffusion models) for higher quality, albeit slower, results.
Optimization-based methods like SDS enable 3D generation from 2D priors, with ongoing research focusing on diversity, photorealism, and efficient feed-forward approaches.
3D scene generation presents unique challenges related to complexity, traversability, and compositionality, with solutions exploring 2D/3D data, LLMs, and hybrid representations.
4D generation and reconstruction is a nascent field, facing significant data scarcity, and is being tackled through reconstruction from monocular videos, 3D generative priors, and learning 4D generators with limited data.
Future directions emphasize feed-forward 4D reconstruction, building larger-scale 4D datasets, and developing advanced training algorithms for limited data to achieve faster, more realistic, and controllable 4D content.

Methods / Models / Datasets Mentioned

NeRF
StyleGAN
StyleGAN2
StyleGAN2-ADA
StyleGAN3
HoloGAN
GIRAFFE
pi-GAN
VolumeGAN
GRAM
StyleNeRF
EG3D
3DAvatarGAN
3DGP
PointFlow
SDfusion
SAGNet
MeshGPT
DeepCAD
Instant3D
LRM
DreamFusion
SDS
ProlificDreamer
HiFA
Wonder3D
DMVD3D
Cat3D
SyncDreamer
Direct3D
SceneWiz3D
DiscoScene
InfiniteNature-Zero
Persistent Nature
GSN
GAUDI
Text2Room
Text2NeRF
LucidDreamer
GraphDreamer
GALA3D
Human4DiT
STAG4D
MoSca
PhysDreamer
PhysGaussian
PAC-NeRF
GAvatar
PhysDiff
HVH
Sora
Objaverse-XL
ShapeNet
ImageNet
Coyo
Ego-Exo4D
Magic3D
MAV3D
4Dfy
Dream-in-4D
4Real

Topics

3D Generation · 4D Generation · Generative Priors · Neural Radiance Fields (NeRF) · StyleGAN · Diffusion Models · 3D Reconstruction · 3D Scene Generation · Multi-view Synthesis · Large Language Models (LLMs) · Gaussian Splatting · Human Motion Animation · Photorealism

Notes

Open for commentary — connections to other work, critiques, follow-up reading.