Foundation models For autonomous driving

Event: CVPR 2025 · Duration: 0 min · ▶ Watch on YouTube

Abstract

Vincent Vanhoucke from Waymo discusses the role of foundation models in advancing autonomous driving technology. He highlights Waymo’s mission to build the world’s most trusted driver, emphasizing the importance of safety and reliability, especially when dealing with vulnerable road users. The presentation showcases how foundation models help autonomous vehicles understand complex driving scenarios, predict human behavior, and make safe decisions in challenging environments, including rare ‘long-tail’ events. It also delves into the concept of ‘driving as a conversation’ using motion tokens and the scaling laws observed in these models, demonstrating their potential to significantly improve performance and address the intricate challenges of real-world autonomous operation.

Speakers

  • Vincent Vanhoucke — Waymo

Talks (1)

  • 00:28Vincent Vanhoucke: Foundation models For autonomous driving
    • This talk explores the application of foundation models to autonomous driving, focusing on how these models enable deep semantic understanding, handle long-tail scenarios, and improve safety and reliability in real-world operations.

Key Takeaways

  • Foundation models offer scalable solutions for deep semantic understanding and human behavior prediction, crucial for safe and reliable autonomous driving.
  • Waymo’s autonomous vehicles demonstrate significantly fewer crashes with vulnerable road users compared to human drivers, emphasizing the safety benefits of advanced AI.
  • Treating driving as a multi-agent conversation using motion tokens allows for modeling complex interactions and leveraging large language model architectures.
  • Scaling laws for autonomous driving models exhibit different characteristics compared to traditional LLMs, indicating unique challenges and opportunities in this domain.
  • Post-training preference alignment, inspired by RLHF, is vital for aligning model behavior with desired human-like safety and comfort preferences, especially for rare and complex scenarios.

Methods / Models / Datasets Mentioned

  • MotionLM
  • Direct Post-Training Preference Alignment for Multi-Agent Motion Generation Model Using Implicit Feedback from Pre-training Demonstrations
  • EMMA
  • Wayformer
  • UniAD
  • DriveVLM
  • VAD
  • OmniDrive
  • DriveVLM-Dual
  • Ego-MLP
  • BEV-Planner
  • Gemini

Topics

Autonomous Driving · Foundation Models · Semantic Understanding · Human Behavior Prediction · Safety in AVs · Long-tail Scenarios · Scaling Laws · Multimodal Models · Post-training Preference Alignment · Motion Forecasting


Notes

Open for commentary — connections to other work, critiques, follow-up reading.