CVPR 2025 Workshop on Autonomous Driving

Event: CVPR 2025 Workshop on Autonomous Driving · Duration: 50 min · ▶ Watch on YouTube

Abstract

This workshop provides a comprehensive overview of the Waymo Open Dataset and its role in advancing autonomous driving research. It introduces the new E2E Driving Dataset and highlights the 2025 challenges, including Vision-based End-to-End Driving, Interaction Prediction, Sim Agents, and Scenario Generation. Top-performing teams present their innovative solutions, showcasing advancements in areas like multimodal behavior prediction, trajectory tokenization, and multi-agent scenario synthesis. The session also covers new metrics like the Rater Feedback Score and updates to the Waymax simulator, emphasizing the community’s engagement and the continuous drive for innovation in autonomous vehicle technology.

Speakers

  • John Gorman — Waymo
  • Wensheng — Waymo
  • Luke Rowe — Mila - Quebec AI Institute
  • Kashyap Chitta — University of Tübingen, Tübingen AI Center, NVIDIA Research
  • Zikang Zhou — City University of Hong Kong
  • Reza Majourian — Waymo
  • Zhiyuan Zhang — Shanghai Jiao Tong University
  • Sen Wang — Shenzhen Urban Transport Planning Center

Talks (10)

  • 00:04John Gorman: Waymo Open Dataset Overview and 2025 Challenges
    • An introduction to the Waymo Open Dataset, its mission to advance autonomous driving research, and an overview of the 2025 challenges and engagement statistics.
  • 04:41Wensheng: Vision-based End-to-End Driving Challenge
    • Introduction to the Vision-based End-to-End Driving challenge, its motivation, the new ultra-rare driving dataset, and the Rater Feedback Score metric, followed by the announcement of 2025 winners.
  • 10:08Luke Rowe: Poutine: Vision-Language-Trajectory Pre-Training and Reinforcement Learning Post-Training Enable Robust End-to-End Autonomous Driving
    • Presentation of Poutine, a Vision Language Model for long-tail driving, detailing its training corpora (CoVLA, WOD-E2E), automated language annotations, two-stage training recipe, and performance results on the test set.
  • 16:29Kashyap Chitta: Unifying End-to-End Autonomous Driving Datasets (DiffusionLTF)
    • Presentation of DiffusionLTF, a solution that unifies diverse AV datasets (CARLA, WOD-P, NAVSIM, WOD-E2E) through a multi-stage training pipeline and a diffusion-based architecture for end-to-end driving prediction.
  • 21:50John Gorman: Perception and Motion Datasets, Interaction Prediction Challenge
    • Overview of Waymo’s Perception and Motion Datasets, including sensor data, labels, and mapping data, followed by an introduction to the Interaction Prediction challenge and its soft mAP metric.
  • 27:19Zikang Zhou: Parallel ModeSeq: Translating Mode Sets for Behavior Prediction into Sequences
    • Presentation of Parallel ModeSeq, a solution for the Interaction Prediction challenge that uses a Transformer-based architecture for sequential mode decoding, addressing multimodality and scene consistency in multi-agent behavior prediction.
  • 32:00Reza Majourian: Sim Agents and Scenario Generation Challenges
    • Introduction to the Sim Agents and Scenario Generation challenges, their objectives, the motivation for simulating realistic joint futures for agents, and the metrics used for evaluation.
  • 35:05Zhiyuan Zhang: TrajTok: what makes for a good trajectory tokenizer in behavior generation?
    • Presentation of TrajTok, a trajectory tokenizer for next-token prediction-based behavior generation models, detailing its design principles (coverage-utilization balance, symmetry, robustness) and demonstrating its superior performance in the Sim Agents challenge.
  • 40:02Reza Majourian: Waymax Update and Scenario Generation Challenge
    • An update on Waymax, Waymo’s lightweight research simulator, followed by an introduction to the Scenario Generation challenge, its inputs, predictions, and evaluation metrics.
  • 42:49Sen Wang: SimFormer
    • Presentation of SimFormer, the winning system for the Scenario Generation Challenge, which addresses limitations of rule-based and regression-based methods by combining unified scene modeling with discrete tokenization and autoregressive Transformers for scalable and generalizable multi-agent scenario synthesis.

Key Takeaways

  • The Waymo Open Dataset continues to expand with new datasets and challenges, fostering significant research in autonomous driving.
  • Vision-based End-to-End Driving is a key area of focus, leveraging new datasets of ultra-rare scenarios and innovative metrics like the Rater Feedback Score.
  • Multimodal behavior prediction and multi-agent scenario generation are critical for robust autonomous systems, with new methods demonstrating improved realism, diversity, and scalability.
  • Advanced training strategies, including multi-stage pre-training on diverse datasets and reinforcement learning fine-tuning, are crucial for achieving state-of-the-art performance in long-tail driving scenarios.
  • The community’s engagement in challenges drives rapid innovation, with new architectures and loss functions pushing the boundaries of what’s possible in autonomous driving.

Methods / Models / Datasets Mentioned

  • Waymo Open Dataset
  • Poutine
  • Qwen 72B Model
  • Qwen 2.5 VL 3B model
  • GRPO algorithm
  • CoVLA dataset
  • DiffusionLTF
  • Open X-AV
  • CARLA
  • NAVSIM
  • WOD-P
  • WOD-E2E
  • Transformer decoder
  • DDIM formulation
  • Parallel ModeSeq
  • ModeSeq
  • RNN-Style Decoder
  • Transformer
  • EMTA Loss
  • Laplace Negative Log-Likelihood
  • Binary Focal Loss
  • Margin Ranking Loss
  • Sim Agents framework
  • TrajTok
  • K-disks tokenizer
  • Spatial-Aware Label Smoothing
  • SMART model
  • SimFormer
  • UniGen
  • GPT-style Transformers
  • k-means
  • uniform binning

Topics

Autonomous Driving · Waymo Open Dataset · End-to-End Driving · Interaction Prediction · Sim Agents · Scenario Generation · Vision Language Models · Trajectory Prediction · Multimodal Prediction · Dataset Benchmarking


Notes

Open for commentary — connections to other work, critiques, follow-up reading.