CVPR 2024 Tutorial: End-to-End Autonomy: A New Era of Self-Driving

Event: CVPR 2024 Tutorial · Duration: 280 min · ▶ Watch on YouTube

Abstract

This tutorial provides a comprehensive overview of end-to-end autonomous driving, emphasizing the shift towards neural simulators and large language models (LLMs). Speakers from Wayve and academia delve into the foundational concepts, recent technological advancements, and future challenges in the field. Key topics include the development of data-driven neural simulators like Ghost Gym and PRISM-1 for realistic scene reconstruction, the integration of LLMs for reasoning, explainability, and control in driving (e.g., Lingo-1, Lingo-2), and the emergence of multimodal foundation models for embodied AI. The tutorial also addresses critical challenges such as data scale, safety, efficiency, and the need for robust benchmarking and evaluation methodologies to accelerate progress in autonomous driving.

Speakers

  • Long Chen — Wayve
  • Jamie Shotton — Chief Scientist, Wayve
  • Hongyang Li — Assistant Professor, University of Hong Kong & Research Scientist, Shanghai AI Lab
  • Nikhil Mohan — Lead Applied Scientist, Wayve
  • Gianluca Corrado — Principal Applied Scientist, Wayve
  • Oleg Sinavski — Principal Applied Scientist, Wayve
  • Elahe Arani — Head of AI Research, Wayve

Talks (7)

  • 00:04:00Long Chen: End-to-End Autonomy: A New Era of Self-Driving
    • An introduction to the tutorial on end-to-end autonomous driving, highlighting the shift in industry and academia towards end-to-end solutions and outlining the day’s schedule.
  • 03:36:00Jamie Shotton: The Road to Embodied AI
    • Discusses the rapid progress of AI, the challenges of real-world autonomous driving, and Wayve’s end-to-end approach to embodied AI, emphasizing simulation, multimodality, and foundation models.
  • 06:06:00Hongyang Li: Could Foundation Models really resolve End-to-end Autonomy?
    • Explores the potential of foundation models to resolve end-to-end autonomy, discussing challenges in data scale, training stability, and the need for robust, interpretable, and efficient systems.
  • 11:18:00Nikhil Mohan: Towards a Neural Simulator: Offline evaluation of end-to-end autonomous vehicles
    • Presents neural simulators as a solution for offline evaluation of end-to-end autonomous vehicles, detailing the shift from traditional AV stacks to end-to-end AI and the importance of data-driven environment creation for robust testing.
  • 14:20:00Gianluca Corrado: Learning Models of the World: Exploring Generative World Models in Autonomous Driving
    • Explores the evolution of generative world models from early neural network approaches to modern transformer and diffusion-based models, highlighting their application in autonomous driving for prediction, planning, and data generation.
  • 17:20:00Oleg Sinavski: Language Meet Driving: Empowering End-to-End Autonomous Driving with Large Language Models
    • Discusses the integration of Large Language Models (LLMs) into end-to-end autonomous driving, highlighting their role in reasoning, explainability, and leveraging compressed information for complex decision-making.
  • 20:20:00Elahe Arani: Navigating the Future of End-to-End Autonomous Driving: Reflections and Future Directions
    • Provides a comprehensive overview of the current state and future directions of end-to-end autonomous driving, addressing challenges in data scale, safety, efficiency, and the role of foundation models and LLMs.

Methods / Models / Datasets Mentioned

  • Tesla FSD Beta V12
  • UniAD
  • Ghost Gym
  • PRISM-1
  • Wayve GAIA (GAIA-1)
  • LINGO-1
  • LINGO-2
  • COMPASS
  • Gato
  • LM-Nav
  • RT-1
  • Open X-Embodiment
  • WayveScenes101
  • VQ-GAN
  • BLIP-2 Q-Former
  • Vicuna-7B
  • LORA finetuning
  • CLIP
  • GPT-3
  • GPT-4
  • DriveGPT4
  • LingoQA
  • Lingo-Judge
  • LangAuto
  • LamPilot
  • DOROTHIE
  • LMdrive
  • CarLLaVA
  • Dreamer (RSSM)
  • Dreamer v1
  • Dreamer v2
  • Dreamer v3
  • Phenaki
  • IRIS
  • Sora
  • V-JEPA
  • OccWorld
  • Copilot4D
  • Vista
  • TransFuser
  • ST-P3
  • DriveDreamer
  • GenAD
  • SubjectDrive
  • LiDarDM
  • DriveWM
  • WoVoGen
  • Vidar
  • DriveWorld
  • MILE
  • TrafficBots
  • Drive-WM
  • MUVO
  • SEM2
  • UniWorld
  • ADriver-I
  • OccWorld
  • Think2Drive
  • WorldDreamer
  • HighwayEnv
  • NuScenes (NavSim)
  • CARLA
  • Waymo
  • Argoverse 2
  • nuPlan
  • DriveSim
  • KITTI-360
  • Waymo Open Dataset
  • WayveScenes101
  • OpenPilot
  • MCTS
  • GNN
  • GPT-3.5
  • Llama
  • CLIP
  • Q-Former
  • Flan-T5
  • Lingo-S/T
  • Agents CoDriver
  • RAG-driver
  • Nuro
  • Drive Anywhere
  • Alexnet
  • Dropout
  • Resnets
  • ImageNet
  • MNIST
  • SuperGLUE
  • VQA
  • MMLU

Topics

End-to-end autonomous driving · Neural simulators · Generative world models · Large Language Models (LLMs) in driving · Multimodality integration · Explainability and trustworthiness in AI · Foundation models for embodied AI · Data scale and quality for autonomous driving · Safety and reliability in AVs · Future trends and challenges in autonomous driving


Notes

Open for commentary — connections to other work, critiques, follow-up reading.