Repurposing generative models for 3D data: Towards a generative model-powered neural simulator

Event: CVPR 2025 Workshop on Autonomous Driving · Duration: 18 min · ▶ Watch on YouTube

Abstract

The presentation focuses on leveraging generative models, particularly the NVIDIA Cosmos world foundation model, for creating realistic 3D LiDAR data. It highlights the importance of LiDAR for safety-critical autonomous vehicles and addresses the limitations of current generative models which primarily focus on 2D images and videos. The speaker details the process of adapting a video generative model to handle LiDAR data, including custom tokenization, motion compensation, and precision considerations. Ultimately, the goal is to achieve joint modeling of LiDAR and RGB data, enabling the generation of complex, multi-modal driving scenes for advanced neural simulation.

Speakers

  • Laura Leal-Taixé — NVIDIA

Talks (1)

  • 00:04Laura Leal-Taixé: Repurposing generative models for 3D data: Towards a generative model-powered neural simulator
    • This talk explores how to repurpose existing generative models, specifically NVIDIA’s Cosmos world foundation model, to generate 3D LiDAR data for autonomous driving simulations, addressing challenges in LiDAR representation, tokenization, and joint modeling with RGB images.

Key Takeaways

  • Generative models, initially designed for 2D images and videos, can be effectively repurposed for 3D LiDAR data generation, crucial for autonomous vehicle simulation.
  • Customizing the tokenizer for LiDAR range maps, including techniques like 4x row repeat and accurate sensor modeling for motion compensation, significantly improves reconstruction quality.
  • Model precision is vital for depth estimation in LiDAR, necessitating careful selection of floating-point formats (e.g., FP32 for tokenizer fine-tuning).
  • LiDAR generation can be conditioned using HD-maps and bounding boxes to control scene layout and agent behavior, as well as text captions for weather and background variations.
  • A single generative model can jointly produce consistent LiDAR and RGB data, enabling multi-modal scene generation with good alignment between the two modalities.

Methods / Models / Datasets Mentioned

  • NeRF
  • 3DGS
  • NVIDIA Cosmos world foundation model
  • DIT-based diffusion model
  • T5 text encoder
  • Patchify
  • Transformer Block
  • Unpatch
  • FP32
  • BF16

Topics

Generative Models · 3D Data Generation · LiDAR Data · Autonomous Driving · Neural Simulation · Diffusion Models · LiDAR Tokenization · HD-Maps · Weather Control · Joint RGB-LiDAR Modeling


Notes

Open for commentary — connections to other work, critiques, follow-up reading.