CVPR 2025 Workshop on Autonomous Driving

Event: CVPR 2025 Workshop on Autonomous Driving · Duration: 50 min · ▶ Watch on YouTube

Abstract

This workshop provides a comprehensive overview of the Waymo Open Dataset and its role in advancing autonomous driving research. It introduces the new E2E Driving Dataset and highlights the 2025 challenges, including Vision-based End-to-End Driving, Interaction Prediction, Sim Agents, and Scenario Generation. Top-performing teams present their innovative solutions, showcasing advancements in areas like multimodal behavior prediction, trajectory tokenization, and multi-agent scenario synthesis. The session also covers new metrics like the Rater Feedback Score and updates to the Waymax simulator, emphasizing the community’s engagement and the continuous drive for innovation in autonomous vehicle technology.

Speakers

John Gorman — Waymo
Wensheng — Waymo
Luke Rowe — Mila - Quebec AI Institute
Kashyap Chitta — University of Tübingen, Tübingen AI Center, NVIDIA Research
Zikang Zhou — City University of Hong Kong
Reza Majourian — Waymo
Zhiyuan Zhang — Shanghai Jiao Tong University
Sen Wang — Shenzhen Urban Transport Planning Center

Talks (10)

00:04 — John Gorman: Waymo Open Dataset Overview and 2025 Challenges
- An introduction to the Waymo Open Dataset, its mission to advance autonomous driving research, and an overview of the 2025 challenges and engagement statistics.
04:41 — Wensheng: Vision-based End-to-End Driving Challenge
- Introduction to the Vision-based End-to-End Driving challenge, its motivation, the new ultra-rare driving dataset, and the Rater Feedback Score metric, followed by the announcement of 2025 winners.
10:08 — Luke Rowe: Poutine: Vision-Language-Trajectory Pre-Training and Reinforcement Learning Post-Training Enable Robust End-to-End Autonomous Driving
- Presentation of Poutine, a Vision Language Model for long-tail driving, detailing its training corpora (CoVLA, WOD-E2E), automated language annotations, two-stage training recipe, and performance results on the test set.
16:29 — Kashyap Chitta: Unifying End-to-End Autonomous Driving Datasets (DiffusionLTF)
- Presentation of DiffusionLTF, a solution that unifies diverse AV datasets (CARLA, WOD-P, NAVSIM, WOD-E2E) through a multi-stage training pipeline and a diffusion-based architecture for end-to-end driving prediction.
21:50 — John Gorman: Perception and Motion Datasets, Interaction Prediction Challenge
- Overview of Waymo’s Perception and Motion Datasets, including sensor data, labels, and mapping data, followed by an introduction to the Interaction Prediction challenge and its soft mAP metric.
27:19 — Zikang Zhou: Parallel ModeSeq: Translating Mode Sets for Behavior Prediction into Sequences
- Presentation of Parallel ModeSeq, a solution for the Interaction Prediction challenge that uses a Transformer-based architecture for sequential mode decoding, addressing multimodality and scene consistency in multi-agent behavior prediction.
32:00 — Reza Majourian: Sim Agents and Scenario Generation Challenges
- Introduction to the Sim Agents and Scenario Generation challenges, their objectives, the motivation for simulating realistic joint futures for agents, and the metrics used for evaluation.
35:05 — Zhiyuan Zhang: TrajTok: what makes for a good trajectory tokenizer in behavior generation?
- Presentation of TrajTok, a trajectory tokenizer for next-token prediction-based behavior generation models, detailing its design principles (coverage-utilization balance, symmetry, robustness) and demonstrating its superior performance in the Sim Agents challenge.
40:02 — Reza Majourian: Waymax Update and Scenario Generation Challenge
- An update on Waymax, Waymo’s lightweight research simulator, followed by an introduction to the Scenario Generation challenge, its inputs, predictions, and evaluation metrics.
42:49 — Sen Wang: SimFormer
- Presentation of SimFormer, the winning system for the Scenario Generation Challenge, which addresses limitations of rule-based and regression-based methods by combining unified scene modeling with discrete tokenization and autoregressive Transformers for scalable and generalizable multi-agent scenario synthesis.

Key Takeaways

The Waymo Open Dataset continues to expand with new datasets and challenges, fostering significant research in autonomous driving.
Vision-based End-to-End Driving is a key area of focus, leveraging new datasets of ultra-rare scenarios and innovative metrics like the Rater Feedback Score.
Multimodal behavior prediction and multi-agent scenario generation are critical for robust autonomous systems, with new methods demonstrating improved realism, diversity, and scalability.
Advanced training strategies, including multi-stage pre-training on diverse datasets and reinforcement learning fine-tuning, are crucial for achieving state-of-the-art performance in long-tail driving scenarios.
The community’s engagement in challenges drives rapid innovation, with new architectures and loss functions pushing the boundaries of what’s possible in autonomous driving.

Methods / Models / Datasets Mentioned

Waymo Open Dataset
Poutine
Qwen 72B Model
Qwen 2.5 VL 3B model
GRPO algorithm
CoVLA dataset
DiffusionLTF
Open X-AV
CARLA
NAVSIM
WOD-P
WOD-E2E
Transformer decoder
DDIM formulation
Parallel ModeSeq
ModeSeq
RNN-Style Decoder
Transformer
EMTA Loss
Laplace Negative Log-Likelihood
Binary Focal Loss
Margin Ranking Loss
Sim Agents framework
TrajTok
K-disks tokenizer
Spatial-Aware Label Smoothing
SMART model
SimFormer
UniGen
GPT-style Transformers
k-means
uniform binning

Topics

Autonomous Driving · Waymo Open Dataset · End-to-End Driving · Interaction Prediction · Sim Agents · Scenario Generation · Vision Language Models · Trajectory Prediction · Multimodal Prediction · Dataset Benchmarking

Notes

Open for commentary — connections to other work, critiques, follow-up reading.