Generating The Invisible: Capturing and Generating Edge-cases in Autonomous Driving
Event: CVPR 2024 Workshop · Duration: 556 min · ▶ Watch on YouTube
Abstract
This segment features multiple talks from the CVPR 2024 workshop on ‘Generating The Invisible: Capturing and Generating Edge-cases in Autonomous Driving.’ Felix Heide from Torc Robotics and Princeton U. discusses their end-to-end differentiable AV stack, emphasizing the role of generative AI and neural rendering in scalable simulation, handling edge cases, and explainable multi-object tracking for autonomous trucks. Siva Manivasagam from Waabi presents Waabi World, a high-fidelity, closed-loop simulator designed for safe self-driving development, detailing its capabilities in digital twin creation, sensor simulation, and robust evaluation. Haowei Sun from the University of Michigan introduces Dense Deep Reinforcement Learning (D2RL) within a Naturalistic and Adversarial Driving Environment (NADE) for efficient safety validation of autonomous vehicles by focusing on rare, safety-critical events. Hanfeng Wu from ETH Zurich presents a method for dynamic LIDAR re-simulation using compositional neural fields to improve geometry reconstruction and reduce the reality-simulation domain gap. This segment features a series of lightning talks from the CVPR 2024 Workshop on Data-Driven Autonomous Driving Simulation. Presentations cover diverse topics including holistic urban 3D scene understanding via Gaussian Splatting, multi-level neural scene graphs for dynamic urban environments, and Lidar-enhanced neural radiance fields for street scenes. Further talks delve into transformer-based generative models for multi-agent traffic simulation and synthesizing simulation environments with generative models. The segment concludes with a keynote on machine learning for realistic and efficient driving simulation, showcasing Waymo’s advancements in sensor and traffic simulation. This segment delves into advanced traffic simulation techniques, focusing on the “Scene Diffuser” model which leverages diffusion models for both scene initialization and rollout. It introduces the “Scene Tensor” concept, enabling various tasks like behavior prediction and scene generation through in-painting. A key innovation is the integration of generalized hard constraints during the diffusion process to enhance realism and the use of amortized diffusion for improved closed-loop motion generation efficiency. The talk also explores the critical challenge of “Agent Tail Realism” and how inference-time constraints, including reaction and dynamic constraints, can be applied to generate and test rare, high-risk scenarios, even using LLMs for prompt-based scene control. This segment introduces the concept of valid human agent models for autonomous driving (AD) simulations, emphasizing the need to accurately represent human behavior and cognitive processes. The speaker highlights the challenges and importance of developing realistic human models to ensure the reliability and safety testing of AD systems. The presentation delves into various behavioral phenomena and cognitive mechanisms that influence human driving, advocating for their integration into simulation environments. This segment features a series of lightning talks on data-driven autonomous driving simulation. Topics range from the importance of modeling human behavior and cognitive mechanisms in AD testing, to the development of neural rendering techniques for generating realistic and safety-critical scenarios. Speakers also present advancements in reinforcement learning for autonomous driving, vision-language models for complex scene understanding, and novel 3D scene reconstruction methods using Gaussian splatting. The segment concludes with a discussion on editable scene simulation using LLM-agents and perceiving 3D scenes from single-glance images through neural field distillation. This segment features two talks on the future of embodied AI, with a strong focus on autonomous driving. Jamie Shotton from Wayve discusses the limitations of traditional AV stacks and introduces Wayve’s end-to-end AI approach, leveraging advanced simulation techniques like PRISM-1 and multimodal foundation models like GAIA and LINGO-2 to achieve generalization, safety, and human-like understanding in complex driving scenarios. Kashyap Chitta then introduces his work on synthesizing simulation environments with generative models, highlighting the importance of graphics simulators in autonomous driving research. This segment introduces SLEDGE, a novel generative model-based simulator for autonomous driving. It highlights the limitations of traditional graphics and log replay simulators, such as high computational cost and limited scenario diversity. SLEDGE addresses these issues by synthesizing realistic and diverse driving scenes, enabling arbitrary duration and routes, and offering a compact representation. The talk details the technical approach, including raster-to-vector autoencoders and latent diffusion transformers, and demonstrates SLEDGE’s capabilities in HD-map generation, agent inpainting, and autoregressive map and agent generation, showcasing its potential for rigorous testing and development of autonomous driving algorithms.
Speakers
- Felix Heide — Torc Robotics & Princeton U.
- Siva Manivasagam — Waabi, Head of Sensor Simulation
- Haowei Sun — University of Michigan
- Hanfeng Wu — ETH Zurich
- Yiyi Liao — Zhejiang University
- Hongyu Zhou — Zhejiang University
- Jiahao Shao — Zhejiang University
- Lu Xu — Huawei Noah’s Ark Lab
- Dongfeng Bai — University of Tübingen
- Weichao Qiu — Tübingen AI Center
- Bingbing Liu — Tübingen AI Center
- Yue Wang — Zhejiang University
- Andreas Geiger — University of Tübingen
- Tobias Fischer — ETH Zürich, Meta Reality Labs
- Lorenzo Porzi — Meta Reality Labs
- Samuel Rota Bulò — Meta Reality Labs
- Marc Pollefeys — ETH Zürich
- Peter Kontschieder — Meta Reality Labs
- Shanlin Sun — UCI, NEC Laboratories America
- Bingbing Zhuang — NEC Laboratories America
- Ziyu Jiang — UC San Diego
- Buyu Liu — UC San Diego
- Xiaohui Xie — UCI
- Manmohan Chandraker — UC San Diego, NEC Laboratories America
- Tiebiao Zhao — Nvidia
- Yu Wang — Pegasus
- Fan Yi — Nvidia
- Guangzhi Cao — ZDrive.ai
- Kashyap Chitta — University of Tübingen
- Drago Anguelov — VP, Head of Research Waymo
- Dragomir Anguelov — Waymo
- Prof. Gustav Markkula — Chair in Applied Behaviour Modelling, Institute for Transport Studies, University of Leeds
- Gustav Markkula — University of Leeds
- Adam Tonderski — Zenseact, Chalmers University, Lund University, WASP
- Carl Lindström — Zenseact, Chalmers University, Lund University, WASP
- Georg Hess — Zenseact, Chalmers University, Lund University, WASP
- William Ljungbergh — Zenseact, Chalmers University, Lund University, WASP
- Lennart Svensson — Zenseact, Chalmers University, Lund University, WASP
- Christoffer Petersson — Zenseact, Chalmers University, Lund University, WASP
- Joakim Johnander — Zenseact, Chalmers University, Lund University, WASP
- Holger Caesar — Zenseact, Chalmers University, Lund University, WASP
- Kalle Åström — Zenseact, Chalmers University, Lund University, WASP
- Michael Felsberg — Zenseact, Chalmers University, Lund University, WASP
- Moritz Harmel — Zoox
- Anubhav Paras — Zoox
- Andreas Pasternak — Zoox
- Nicholas Roy — Zoox
- Gary Linscott — Zoox
- Katrin Renz — Eberhard Karls Universität Tübingen
- Chonghao Sima — Eberhard Karls Universität Tübingen
- Hang Zhao — Tsinghua University
- Xiaoyu Tian — Tsinghua University
- Junru Gu — Tsinghua University
- Yicheng Liu — Tsinghua University
- Xiaoyu Zhou — Peking University
- Zhiwei Lin — Peking University
- Xiaojun Shan — Peking University
- Yongtao Wang — Peking University
- Deqing Sun — Google Research
- Ming-Hsuan Yang — University of California, Merced
- Yuxi Wei — Shanghai Jiao Tong University
- Zi Wang — Shanghai Jiao Tong University
- Yifan Lu — Shanghai Jiao Tong University
- Chenxin Xu — Shanghai Jiao Tong University
- Changxing Liu — Shanghai Jiao Tong University
- Hao Zhao — Shanghai Jiao Tong University
- Siheng Chen — Shanghai Jiao Tong University
- Yanfeng Wang — Shanghai Jiao Tong University
- Letian Wang — University of Toronto
- Seung Wook Kim — University of Toronto
- Jiawei Yang — University of Toronto
- Cunjun Yu — University of Toronto
- Boris Ivanovic — University of Toronto
- Steven Waslander — University of Toronto
- Sanja Fidler — University of Toronto
- Marco Pavone — University of Toronto
- Peter Karkus — University of Toronto
- Jamie Shotton — Chief Scientist, Wayve
Talks (24)
- 00:00:00 — Felix Heide: Generating The Invisible: Capturing and Generating Edge-cases in Autonomous Driving
- This segment introduces the workshop agenda and the first speaker, Felix Heide, who will discuss generative AI for autonomous driving.
- 01:19:28 — Yiyi Liao: HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting
- This talk presents HUGS, a method for holistic urban 3D scene understanding using Gaussian Splatting, which extends 3D Gaussians with multi-modal information and utilizes a unicycle model for robust object motion optimization, enabling real-time rendering and 3D semantic reconstruction from single RGB video inputs.
- 01:19:28 — Tobias Fischer: Multi-Level Neural Scene Graphs for Dynamic Urban Environments
- This presentation introduces a scalable, multi-level neural scene graph representation for dynamic urban environments, enabling fast training and rendering via composite ray sampling and offering powerful scene editing capabilities, while also proposing a benchmark for radiance field reconstruction from heterogeneous vehicle captures.
- 01:19:28 — Shanlin Sun: LidarRF: Delving into Lidar for Neural Radiance Field on Street Scenes
- This talk introduces LidarRF, a novel approach that leverages Lidar data to enhance Neural Radiance Fields for photorealistic street scene simulation, incorporating Lidar encoding, robust depth supervision with curriculum learning, and augmented view supervision to improve reconstruction quality and address challenges in sparse data regions.
- 01:19:28 — Tiebiao Zhao: Multiverse Transformer: Advancing Closed-Loop Multi-Agent Simulation with Generative Model
- This presentation introduces the Multiverse Transformer, a transformer-based generative model for closed-loop multi-agent traffic simulation, which generates diverse parallel universes of driving scenarios, utilizes a receding prediction horizon for multi-modal diversity, and achieved top performance in the Waymo Open Sim Agents Challenge.
- 01:19:28 — Kashyap Chitta: Synthesizing Simulation Environments with Generative Models
- This talk presents SLEDGE, a generative model for synthesizing simulation environments, focusing on HD-Map generation and agent inpainting, demonstrating state-of-the-art results in creating diverse and realistic traffic scenarios for autonomous driving simulation.
- 01:19:28 — Drago Anguelov: ML for Realistic and Efficient Driving Simulation
- This talk discusses Waymo’s experience in autonomous driving, highlighting the challenges of complex, high-dimensional inputs and real-time latency requirements, and presents their machine learning approaches for realistic and efficient driving simulation, including advancements in sensor simulation using 3D Gaussian Splatting and diffusion models for traffic simulation.
- 02:38:57 — Dragomir Anguelov: Machine Learning for Realistic and Efficient Simulation
- This segment introduces the Scene Diffuser model for traffic simulation, detailing its use of diffusion models for scene initialization and rollout, incorporating hard constraints, and leveraging amortized diffusion for efficient closed-loop motion generation. It also explores controllability through inference-time constraints and LLMs, addressing the challenge of generating rare agent behaviors.
- 02:55:00 — Felix Heide: Generating the Invisible: Generative AI for Scalable Autonomous Driving
- This talk introduces Torc’s end-to-end differentiable AV stack, highlighting how generative AI and neural rendering are used for scalable simulation, sensor calibration, edge case handling, and explainable multi-object tracking in autonomous trucking.
- 03:58:26 — Prof. Gustav Markkula: Valid human agents in simulated AD testing: Behavioural phenomena and cognitive mechanisms
- This talk discusses the importance of valid human agent models in autonomous driving simulations, focusing on behavioral phenomena and cognitive mechanisms to ensure realistic and reliable testing.
- 03:59:59 — Siva Manivasagam: Generative AI for Developing and Deploying Self-driving Systems Safely
- This presentation details Waabi World, a high-fidelity, closed-loop, end-to-end simulator leveraging generative AI for safe and scalable self-driving development, focusing on digital twin creation, sensor simulation, and robust evaluation metrics.
- 05:18:00 — Gustav Markkula: Valid human agents in simulated AD testing: Behavioural phenomena and cognitive mechanisms
- Discusses the importance of modeling human behavior in AD testing, focusing on behavioral phenomena and cognitive mechanisms, and how high-level metrics are not always sufficient.
- 05:23:55 — Adam Tonderski: NeuRAD: Neural Rendering for Autonomous Driving
- Presents NeuRAD, a neural rendering method for autonomous driving, detailing its architecture, requirements, and state-of-the-art performance in generating realistic sensor data for AD scenarios.
- 05:28:00 — William Ljungbergh: Neural Rendering for Safety-critical Autonomous Driving Simulation
- Explains how NeuRAD is used in a closed-loop NeuroNCAP simulation engine to evaluate AD systems in safety-critical scenarios, highlighting the poor performance of current E2E planners.
- 05:33:00 — Moritz Harmel: Scaling Is All You Need: Autonomous Driving with JAX-Accelerated Reinforcement Learning
- Discusses using JAX-accelerated reinforcement learning in a realistic simulator to train autonomous driving policies, demonstrating improved safety and progress metrics through large-scale training.
- 05:38:00 — Katrin Renz: DriveVLM: Driving with Graph Visual Question Answering
- Introduces DriveVLM, a visual-language model that uses graph-based visual question answering for driving, emphasizing its potential for generalization and explainability in complex scenarios.
- 05:43:00 — Hang Zhao: DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models
- Presents DriveVLM, a vision-language model for autonomous driving, and proposes a Dual System architecture to combine its reasoning capabilities with traditional pipelines for robust performance in long-tail scenarios.
- 05:48:00 — Xiaoyu Zhou: DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes
- Introduces DrivingGaussian, a 3DGS framework for reconstructing and rendering complex dynamic driving scenes from multi-sensor data, achieving photorealistic quality and enabling corner case simulation.
- 05:53:00 — Yuxi Wei: Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents
- Presents ChatSim, a language-controlled photorealistic driving scene simulation system that uses collaborative LLM-agents and advanced rendering techniques for easy and flexible scene editing.
- 05:58:00 — Letian Wang: DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features
- Introduces DistillNeRF, a method to reconstruct 3D scenes from a single image by distilling knowledge from pre-trained NeRFs and foundation models, achieving 3D consistency and generalization.
- 06:39:24 — Jamie Shotton: The Road to Embodied AI
- Jamie Shotton discusses the challenges and opportunities of embodied AI, particularly in the context of autonomous driving, highlighting Wayve’s end-to-end AI approach, simulation, and multimodal foundation models.
- 07:56:53 — Kashyap Chitta: SLEDGE: Synthesizing Simulation Environments for Driving Agents with Generative Models
- This talk introduces SLEDGE, a generative model-based simulator for autonomous driving that synthesizes diverse and realistic driving scenarios, offering advantages over traditional graphics and log replay simulators.
- 09:59:59 — Haowei Sun: Dense Reinforcement Learning for Safety Validation of Autonomous Vehicles
- This talk introduces Dense Deep Reinforcement Learning (D2RL) within a Naturalistic and Adversarial Driving Environment (NADE) to efficiently validate autonomous vehicles by focusing on rare, safety-critical events through Markov process editing.
- 15:29:59 — Hanfeng Wu: Dynamic LIDAR Re-simulation using Compositional Neural Fields
- This presentation introduces a method for dynamic LIDAR re-simulation using compositional neural fields to enhance geometry reconstruction, lower the domain gap between real and simulated data, and enable rich scene editing capabilities for dynamic driving scenarios.
Key Takeaways
- Generative AI and neural rendering are crucial for creating scalable and realistic simulations to address the vast number of edge cases in autonomous driving.
- End-to-end differentiable AV stacks, from raw sensor data to planning, allow for comprehensive optimization and improved robustness in complex driving scenarios.
- Advanced simulation platforms like Waabi World aim to provide high-fidelity, closed-loop testing environments that are diverse, fast, controllable, and realistic, reducing reliance on extensive real-world mileage.
- Novel approaches like D2RL and compositional neural fields are being developed to enhance the efficiency of safety validation, improve sensor simulation realism, and bridge the reality-simulation domain gap for autonomous systems.
- Advanced neural field-based methods, including 3D Gaussian Splatting, are crucial for achieving real-time, high-fidelity urban scene understanding and simulation, addressing limitations of previous NeRF-based approaches.
- Integrating Lidar data through techniques like Lidar encoding and robust depth supervision significantly enhances the quality of 3D semantic reconstruction and rendering, particularly for complex street scenes.
- Generative models, such as the Multiverse Transformer and diffusion models, are proving effective in creating diverse, realistic, and controllable multi-agent traffic simulations, which are essential for scalable system validation in autonomous driving.
- Waymo’s extensive experience in autonomous driving highlights the importance of machine learning for both realistic sensor simulation and efficient traffic simulation, leveraging large datasets and advanced models to improve safety and performance.
- Diffusion models offer a unified and flexible framework for both initializing and rolling out traffic scenes, treating various tasks as in-painting problems on a ‘Scene Tensor’.
- Integrating generalized hard constraints directly into the diffusion process during inference significantly improves the realism of generated traffic scenarios by preventing physically impossible or unnatural behaviors.
- Amortized diffusion provides a more efficient approach for closed-loop motion generation, substantially closing the performance gap with open-loop models while reducing computational cost.
- Controllability, including the generation of rare and challenging ‘agent tail realism’ scenarios, can be achieved by specifying inference-time constraints, with potential for natural language interaction via LLMs.
- Accurate human agent models are crucial for valid and reliable testing of autonomous driving systems in simulation.
- Understanding and integrating complex human behavioral phenomena and cognitive mechanisms is key to developing effective human models for AD simulations.
- The development of data-driven autonomous driving simulations requires robust human models to ensure that testing scenarios reflect real-world interactions and challenges.
- Modeling human behavioral phenomena and cognitive mechanisms is crucial for validating autonomous driving systems, as high-level performance metrics alone may not capture critical aspects of human-like interaction.
- Neural rendering techniques like NeuRAD offer a promising avenue for creating photorealistic and controllable simulations of safety-critical driving scenarios, enabling robust evaluation of AD systems in closed-loop environments.
- Scaling reinforcement learning with JAX-accelerated simulators and large datasets can significantly improve the safety and performance of autonomous driving policies, demonstrating the potential for RL to surpass human driving capabilities.
- Vision-language models (VLMs) like DriveVLM and ChatSim are emerging as powerful tools for autonomous driving, offering capabilities for holistic scene understanding, reasoning, and language-controlled simulation editing, addressing challenges in generalization and explainability.
- Embodied AI, particularly in autonomous driving, is a rapidly advancing field with the potential to transform human-technology interactions, moving beyond traditional AI tasks.
- End-to-end AI systems, leveraging simulation and multimodal foundation models, offer a promising path to overcome the limitations of traditional, modular AV stacks by providing computational homogeneity, generalization through data, scalability, and superior safety.
- Generative models and neural rendering are crucial for creating diverse, dynamic, and controllable simulation environments necessary for training and validating autonomous systems, especially for handling complex edge cases and enabling counterfactual testing.
- Integrating language with vision and action through models like LINGO-2 allows for more explainable, intelligent, and trustworthy autonomous systems, enabling human-like understanding and interaction with the physical world.
- Traditional graphics and log replay simulators for autonomous driving have limitations in terms of scenario diversity, computational cost, and reproducibility, hindering comprehensive testing and development.
- Generative models, specifically latent diffusion transformers, can synthesize realistic and diverse driving scenes, offering a compact representation and enabling arbitrary duration and routes for simulations.
- SLEDGE, a generative model-based simulator, leverages raster-to-vector autoencoders and transformer decoders to generate vector-based scene elements, providing enhanced controllability and enabling tasks like HD-map generation, agent inpainting, and spatial outpainting.
- Long-duration simulations enabled by generative models are crucial for exposing failures in state-of-the-art planning algorithms, demonstrating the need for more diverse and controllable simulation environments beyond what traditional methods offer.
Methods / Models / Datasets Mentioned
2D encoder3D Asset Management3D Gaussian Splatting (3DGS)3D decoder3D perceptionAccelerated driving simulatorAction-sensitive theory of mindAdaLNAlgoluxAutoBotsBERTBLIP-2 Q-FormerBackground RenderingBayesian perceptual filteringBehavioral CloningBlockNeRFCARLACLIPCOMPASSChatSimClosed-loop simulationCollision severity scoreComposite Gaussian Splatting (3DGS)Ctrl-SimD2RLDETRDINOv2Depth estimationDiTDiffusion ModelsDistillNeRFDistributed RLDriveLM-DataDriveVLMDriveVLM-DualDrivingGaussianDual System for Autonomous DrivingDyNFLDynamic Gaussian GraphEuro NCAP scenariosEvidence accumulationForeground RenderingFoundation ModelsGAIAGET3DGatoGaussianProGhost GymGlobal renderingGraph Visual Question AnsweringHDMapGenHybrid mechanistic/ML modelingHyperNeRFISPImage Diff.Incremental static 3D GaussiansInstant-NGPJAXKnowledge distillationLIDARSimLINGO-1LINGO-2LLM-AgentsLLMsLLaMALM-NavLOFTR + MAGSACLORA finetuningLidar priorLidar ray dropLidarRFLift-Splat-ShootLong-term estimation of action valuesMARSMTRMcLight (Lighting Estimation)McNeRF (Multi-Camera NeRF)Meta-actionsMip-NeRF360Motion predictionMotionDiffuserMotionLMMulti-Agent Collaboration FrameworkMulti-cameraMultiVerse TransformerNADENFL2NLFE2+NLLNSGNeRFNerfactoNerfiesNeuRADNeuRasNeural feature fieldNeuroNCAP simulation engineNocturneNovel view synthesisOpen X-EmbodimentOpen-loop evaluationOpenDV-2kPDM-ClosedPPOPRISM-1Perceiver IOPrompt tuningProposal samplingRMSERT-1RVAEReference trajectoriesResNetRetinal sensory noiseRolling shutter handlingRule-based QA generationSLEDGESOLD-NetSUDSScene DiffuserSensor embeddingSingle-image 3D reconstructionSystem 1 brainSystem 2 brainTEDiTORCSTeraSimTraditional self-driving pipelineTrafficSimTrajectory planningUniADUniSimUrban Radiance FieldsV-trace off-policy correctionVADVISTAVQ-GANVehicle DeletingVehicle MotionViTVicuna-7BView AdjustmentVisual loomingVolume renderingWOSAC metricsWaabi WorldWaymaxWaymo's Waymaxautoregressive transformerdiffusion video decodernuPlannuScenes
Topics
3D Reconstruction · 3D Scene Reconstruction · AD Testing · Agent Inpainting · Agent Tail Realism · Amortized Diffusion · Autonomous Driving · Autonomous Driving Simulation · Behavioral Phenomena · Cognitive Mechanisms · Controllability · Counterfactual Testing · Data-driven approaches · Diffusion Models · Digital Twins · Domain Gap · Edge Cases · Embodied AI · End-to-end AI · End-to-end AV Stacks · Foundation Models · Gaussian Splatting · Generative AI · Generative Models · HD-Map Generation · Hard Constraints · Human Agent Models · Human Behavior Modeling · Human-in-the-loop · Knowledge Distillation · LIDAR Simulation · LLM-Agents · LLM-based Scene Control · Language Models · Latent Diffusion Transformer · Lidar · Log Replay Simulators · Machine Learning · Motion Generation · Multi-object Tracking · Multimodality · Neural Radiance Fields · Neural Rendering · Reinforcement Learning · Safety Validation · Safety-Critical Scenarios · Scene Generation · Scene Reconstruction · Scene Tensor · Scene Understanding · Sensor Simulation · Simulation · Simulation Validity · Traffic Simulation · Vision-Language Models
Notes
Open for commentary — connections to other work, critiques, follow-up reading.