CVPR 2024 Tutorial: End-to-End Autonomy: A New Era of Self-Driving

Event: CVPR 2024 Tutorial · Duration: 239 min · ▶ Watch on YouTube

Abstract

This tutorial provides a comprehensive overview of end-to-end autonomy in self-driving vehicles, tracing its evolution from traditional modular systems to advanced AI-driven solutions. It delves into the core motivations behind this paradigm shift, highlighting the limitations of rule-based and data-driven approaches in handling the complexity and unpredictability of real-world driving scenarios. The tutorial showcases Wayve’s pioneering work in developing neural simulators like Ghost Gym and PRISM-1, which enable data-driven, scalable, and controllable simulation environments. It also explores the transformative potential of generative world models, such as GAIA-1 and VISTA, for data generation, understanding complex interactions, and robust policy learning. A significant portion is dedicated to the integration of Large Language Models (LLMs) in autonomous driving, emphasizing their role in enhancing explainability, reasoning, and multimodal perception through models like LINGO-1 and LINGO-2. The tutorial concludes by addressing critical challenges related to data scale, efficiency, safety, and regulatory ambiguities, while outlining future trends towards foundation models, zero-shot learning, and the ultimate goal of achieving human-level general intelligence in autonomous systems.

Speakers

  • Long Chen — Wayve
  • Jamie Shotton — Wayve
  • Hongyang Li — Shanghai AI Lab / University of Hong Kong
  • Nikhil Mohan — Wayve
  • Gianluca Corrado — Wayve
  • Oleg Sinavski — Wayve
  • Elahe Arani — Wayve

Talks (7)

  • 00:00:00 — Long Chen: CVPR 2024 Tutorial: End-to-End Autonomy: A New Era of Self-Driving
    • Introduction to the CVPR 2024 tutorial on End-to-End Autonomy, highlighting the shift towards end-to-end solutions in both industry and academia, with a detailed schedule of speakers and topics.
  • 00:03:17Jamie Shotton: The Road to Embodied AI
    • An overview of the accelerating progress in AI, emphasizing the shift towards embodied AI, particularly in autonomous driving, and introducing Wayve’s end-to-end approach to tackle the complexities of real-world driving.
  • 00:51:00Hongyang Li: Could Foundation Models really resolve End-to-end Autonomy?
    • An exploration of whether foundation models can truly resolve end-to-end autonomy, discussing the shift from traditional modular systems to end-to-end solutions, the role of world models, and the challenges of data scale, efficiency, and safety in autonomous driving.
  • 01:25:08Nikhil Mohan: Towards a Neural Simulator: Offline evaluation of end-to-end autonomous vehicles
    • A deep dive into the need for neural simulators for offline evaluation of end-to-end autonomous vehicles, highlighting the challenges of traditional AV stacks and introducing Wayve’s Ghost Gym and PRISM-1 as data-driven, scalable, and controllable simulation platforms.
  • 02:05:07Gianluca Corrado: Learning Models of the World: Exploring Generative World Models in Autonomous Driving
    • An exploration of generative world models in autonomous driving, tracing their evolution from early neural network-based approaches to modern transformer and diffusion models, and highlighting their potential for data generation, world understanding, and robust policy learning.
  • 02:43:49Oleg Sinavski: Language Meet Driving: Empowering End-to-End Autonomous Driving with Large Language Models
    • An exploration of how Large Language Models (LLMs) are empowering end-to-end autonomous driving, focusing on their role in explainability, reasoning, and multimodal integration, and discussing the challenges and future trends in developing robust, efficient, and trustworthy autonomous systems.
  • 03:24:19Elahe Arani: Navigating the Future of End-to-End Autonomous Driving: Reflections and Future Directions
    • A comprehensive overview of the challenges and future trends in end-to-end autonomous driving, emphasizing the need for robust benchmarking, advanced simulation, and efficient, interpretable, and safe systems, while highlighting the potential of foundation models and multimodal integration.

Key Takeaways

  • End-to-end autonomy is a paradigm shift in self-driving, moving away from traditional modular systems to integrated AI solutions that leverage raw sensor data for direct control.
  • Neural simulators like Ghost Gym and PRISM-1 are crucial for offline evaluation, enabling data-driven, scalable, and controllable testing of autonomous vehicles in complex, dynamic environments.
  • Generative world models (e.g., GAIA-1, VISTA) offer significant potential for data generation, understanding complex interactions, and robust policy learning, allowing for the simulation of diverse and challenging scenarios.
  • Large Language Models (LLMs) are increasingly integrated into autonomous driving systems to enhance explainability, reasoning, and multimodal perception, fostering trust and enabling more informed decision-making.
  • The future of end-to-end autonomous driving lies in the development of foundation models, multimodal integration, and efficient, interpretable, and safe systems that can adapt to novel situations and generalize across diverse environments.

Methods / Models / Datasets Mentioned

  • Ghost Gym
  • PRISM-1
  • UniAD
  • GAIA-1
  • VISTA
  • Lingo-1
  • Lingo-2
  • MCTS
  • GNN
  • GPT-3
  • GPT-3.5
  • GPT-4
  • GenAD
  • DriveGPT4
  • RAG-driver
  • LMdrive
  • Nuro
  • Drive Anywhere
  • LangProp
  • LaMPilot
  • DOROTHIE
  • HILM-D
  • RSSM
  • Dreamer v1
  • Dreamer v2
  • Dreamer v3
  • Phenaki
  • IRIS
  • Sora
  • V-JEPA
  • MILE
  • SEM2
  • Drive-WM
  • Coplilot4D
  • OccWorld
  • DriveWorld
  • DriveDreamer
  • TrafficBots
  • Panacea
  • SubjectDrive
  • LidarDM
  • Iso-Dream
  • UniWorld
  • MUVO
  • VIDAR
  • WoVoGen
  • Think2Drive
  • DriveAGI
  • DriveLM
  • OpenLane
  • VIDAR
  • ELM
  • DriveAdapter
  • MP3
  • CLIP
  • Q-Former
  • Flan-T5
  • LoRA
  • Llama
  • PID controller
  • Model Predictive Control
  • VQ-VAE
  • VQ-GAN
  • VIVIT
  • COLMAP
  • Nerfstudio
  • NeRF
  • HyperNeRF
  • Nerfies
  • NSFF
  • iPhone
  • D-NeRF
  • DriveSim
  • Carla
  • Waabi World
  • Waymo's Waymax
  • nuScenes
  • Waymo
  • Argoverse2
  • nuPlan
  • KITTI-360
  • Openpilot
  • CNN E2E
  • BDD-V
  • CILRS
  • Conditional IL
  • DArB
  • AGILEAD
  • SafeDagger
  • Generalization
  • NMP
  • BDD-X
  • PlanT
  • Patch-wise Feature Extraction
  • Multimodal Foundation Model
  • Policy Network
  • Transformer Block
  • ST-Adapter
  • Visual Encoder
  • AvgPool
  • Q-Former
  • Linear
  • Large Language Model
  • Enumeration Module
  • Incorporation Module
  • HR Spatial Extractor
  • Cross-attention
  • MLP
  • Query Detection Module
  • Lingo-1
  • Lingo-Judge
  • Lingo-2
  • AgentsCoDriver
  • DriveGPT4
  • RAG-driver
  • LMdrive
  • Nuro
  • Drive Anywhere
  • LLM-based Planner
  • Low-level Controller
  • PID
  • Wayve's Vision Model
  • Vision Encoder
  • Visual Tokens
  • Prediction Headers
  • BEV Map
  • Traffic Light States
  • Waypoint
  • Target Point
  • Future Waypoints
  • Multi-view RGB & LiDAR
  • Navigation Instruction
  • Action Tokens
  • Instruction Following
  • Attention based chain-of-thought
  • CarLLaVA
  • Llama
  • WayveScenes101
  • Open Drive Lab
  • DriveAGI
  • VQ-VAE
  • VQ-GAN
  • VIVIT
  • COLMAP
  • Nerfstudio
  • NeRF
  • HyperNeRF
  • Nerfies
  • NSFF
  • D-NeRF
  • DriveDreamer
  • TrafficBots
  • Panacea
  • SubjectDrive
  • LidarDM
  • Iso-Dream
  • UniWorld
  • MUVO
  • VIDAR
  • WoVoGen
  • Think2Drive
  • GenAD
  • DriveWorld
  • Tesla World Model
  • SEM2
  • GAIA-1
  • ADrive-I
  • Copilot4D
  • OccWorld
  • Drive-WM
  • VISTA
  • NeuRAD
  • Sora
  • V-JEPA
  • Delphi

Topics

End-to-End Autonomy · Neural Simulators · Generative World Models · Large Language Models (LLMs) · Explainability · Reasoning · Multimodal Integration · Data Scale · Safety and Reliability · Foundation Models


Notes

Open for commentary — connections to other work, critiques, follow-up reading.