Repurposing generative models for 3D data: Towards a generative model-powered neural simulator
Event: CVPR 2025 Workshop on Autonomous Driving · Duration: 18 min · ▶ Watch on YouTube
Abstract
The presentation focuses on leveraging generative models, particularly the NVIDIA Cosmos world foundation model, for creating realistic 3D LiDAR data. It highlights the importance of LiDAR for safety-critical autonomous vehicles and addresses the limitations of current generative models which primarily focus on 2D images and videos. The speaker details the process of adapting a video generative model to handle LiDAR data, including custom tokenization, motion compensation, and precision considerations. Ultimately, the goal is to achieve joint modeling of LiDAR and RGB data, enabling the generation of complex, multi-modal driving scenes for advanced neural simulation.
Speakers
- Laura Leal-Taixé — NVIDIA
Talks (1)
- 00:04 — Laura Leal-Taixé: Repurposing generative models for 3D data: Towards a generative model-powered neural simulator
- This talk explores how to repurpose existing generative models, specifically NVIDIA’s Cosmos world foundation model, to generate 3D LiDAR data for autonomous driving simulations, addressing challenges in LiDAR representation, tokenization, and joint modeling with RGB images.
Key Takeaways
- Generative models, initially designed for 2D images and videos, can be effectively repurposed for 3D LiDAR data generation, crucial for autonomous vehicle simulation.
- Customizing the tokenizer for LiDAR range maps, including techniques like 4x row repeat and accurate sensor modeling for motion compensation, significantly improves reconstruction quality.
- Model precision is vital for depth estimation in LiDAR, necessitating careful selection of floating-point formats (e.g., FP32 for tokenizer fine-tuning).
- LiDAR generation can be conditioned using HD-maps and bounding boxes to control scene layout and agent behavior, as well as text captions for weather and background variations.
- A single generative model can jointly produce consistent LiDAR and RGB data, enabling multi-modal scene generation with good alignment between the two modalities.
Methods / Models / Datasets Mentioned
NeRF3DGSNVIDIA Cosmos world foundation modelDIT-based diffusion modelT5 text encoderPatchifyTransformer BlockUnpatchFP32BF16
Topics
Generative Models · 3D Data Generation · LiDAR Data · Autonomous Driving · Neural Simulation · Diffusion Models · LiDAR Tokenization · HD-Maps · Weather Control · Joint RGB-LiDAR Modeling
Notes
Open for commentary — connections to other work, critiques, follow-up reading.