CVPR 2025 Workshop on Autonomous Driving
Event: CVPR 2025 Workshop on Autonomous Driving · Duration: 50 min · ▶ Watch on YouTube
Abstract
This workshop provides a comprehensive overview of the Waymo Open Dataset and its role in advancing autonomous driving research. It introduces the new E2E Driving Dataset and highlights the 2025 challenges, including Vision-based End-to-End Driving, Interaction Prediction, Sim Agents, and Scenario Generation. Top-performing teams present their innovative solutions, showcasing advancements in areas like multimodal behavior prediction, trajectory tokenization, and multi-agent scenario synthesis. The session also covers new metrics like the Rater Feedback Score and updates to the Waymax simulator, emphasizing the community’s engagement and the continuous drive for innovation in autonomous vehicle technology.
Speakers
- John Gorman — Waymo
- Wensheng — Waymo
- Luke Rowe — Mila - Quebec AI Institute
- Kashyap Chitta — University of Tübingen, Tübingen AI Center, NVIDIA Research
- Zikang Zhou — City University of Hong Kong
- Reza Majourian — Waymo
- Zhiyuan Zhang — Shanghai Jiao Tong University
- Sen Wang — Shenzhen Urban Transport Planning Center
Talks (10)
- 00:04 — John Gorman: Waymo Open Dataset Overview and 2025 Challenges
- An introduction to the Waymo Open Dataset, its mission to advance autonomous driving research, and an overview of the 2025 challenges and engagement statistics.
- 04:41 — Wensheng: Vision-based End-to-End Driving Challenge
- Introduction to the Vision-based End-to-End Driving challenge, its motivation, the new ultra-rare driving dataset, and the Rater Feedback Score metric, followed by the announcement of 2025 winners.
- 10:08 — Luke Rowe: Poutine: Vision-Language-Trajectory Pre-Training and Reinforcement Learning Post-Training Enable Robust End-to-End Autonomous Driving
- Presentation of Poutine, a Vision Language Model for long-tail driving, detailing its training corpora (CoVLA, WOD-E2E), automated language annotations, two-stage training recipe, and performance results on the test set.
- 16:29 — Kashyap Chitta: Unifying End-to-End Autonomous Driving Datasets (DiffusionLTF)
- Presentation of DiffusionLTF, a solution that unifies diverse AV datasets (CARLA, WOD-P, NAVSIM, WOD-E2E) through a multi-stage training pipeline and a diffusion-based architecture for end-to-end driving prediction.
- 21:50 — John Gorman: Perception and Motion Datasets, Interaction Prediction Challenge
- Overview of Waymo’s Perception and Motion Datasets, including sensor data, labels, and mapping data, followed by an introduction to the Interaction Prediction challenge and its soft mAP metric.
- 27:19 — Zikang Zhou: Parallel ModeSeq: Translating Mode Sets for Behavior Prediction into Sequences
- Presentation of Parallel ModeSeq, a solution for the Interaction Prediction challenge that uses a Transformer-based architecture for sequential mode decoding, addressing multimodality and scene consistency in multi-agent behavior prediction.
- 32:00 — Reza Majourian: Sim Agents and Scenario Generation Challenges
- Introduction to the Sim Agents and Scenario Generation challenges, their objectives, the motivation for simulating realistic joint futures for agents, and the metrics used for evaluation.
- 35:05 — Zhiyuan Zhang: TrajTok: what makes for a good trajectory tokenizer in behavior generation?
- Presentation of TrajTok, a trajectory tokenizer for next-token prediction-based behavior generation models, detailing its design principles (coverage-utilization balance, symmetry, robustness) and demonstrating its superior performance in the Sim Agents challenge.
- 40:02 — Reza Majourian: Waymax Update and Scenario Generation Challenge
- An update on Waymax, Waymo’s lightweight research simulator, followed by an introduction to the Scenario Generation challenge, its inputs, predictions, and evaluation metrics.
- 42:49 — Sen Wang: SimFormer
- Presentation of SimFormer, the winning system for the Scenario Generation Challenge, which addresses limitations of rule-based and regression-based methods by combining unified scene modeling with discrete tokenization and autoregressive Transformers for scalable and generalizable multi-agent scenario synthesis.
Key Takeaways
- The Waymo Open Dataset continues to expand with new datasets and challenges, fostering significant research in autonomous driving.
- Vision-based End-to-End Driving is a key area of focus, leveraging new datasets of ultra-rare scenarios and innovative metrics like the Rater Feedback Score.
- Multimodal behavior prediction and multi-agent scenario generation are critical for robust autonomous systems, with new methods demonstrating improved realism, diversity, and scalability.
- Advanced training strategies, including multi-stage pre-training on diverse datasets and reinforcement learning fine-tuning, are crucial for achieving state-of-the-art performance in long-tail driving scenarios.
- The community’s engagement in challenges drives rapid innovation, with new architectures and loss functions pushing the boundaries of what’s possible in autonomous driving.
Methods / Models / Datasets Mentioned
Waymo Open DatasetPoutineQwen 72B ModelQwen 2.5 VL 3B modelGRPO algorithmCoVLA datasetDiffusionLTFOpen X-AVCARLANAVSIMWOD-PWOD-E2ETransformer decoderDDIM formulationParallel ModeSeqModeSeqRNN-Style DecoderTransformerEMTA LossLaplace Negative Log-LikelihoodBinary Focal LossMargin Ranking LossSim Agents frameworkTrajTokK-disks tokenizerSpatial-Aware Label SmoothingSMART modelSimFormerUniGenGPT-style Transformersk-meansuniform binning
Topics
Autonomous Driving · Waymo Open Dataset · End-to-End Driving · Interaction Prediction · Sim Agents · Scenario Generation · Vision Language Models · Trajectory Prediction · Multimodal Prediction · Dataset Benchmarking
Notes
Open for commentary — connections to other work, critiques, follow-up reading.