CVPR 2024 Tutorial: End-to-End Autonomy: A New Era of Self-Driving

Event: CVPR 2024 Tutorial · Duration: 280 min · ▶ Watch on YouTube

Abstract

This tutorial provides a comprehensive overview of end-to-end autonomous driving, emphasizing the shift towards neural simulators and large language models (LLMs). Speakers from Wayve and academia delve into the foundational concepts, recent technological advancements, and future challenges in the field. Key topics include the development of data-driven neural simulators like Ghost Gym and PRISM-1 for realistic scene reconstruction, the integration of LLMs for reasoning, explainability, and control in driving (e.g., Lingo-1, Lingo-2), and the emergence of multimodal foundation models for embodied AI. The tutorial also addresses critical challenges such as data scale, safety, efficiency, and the need for robust benchmarking and evaluation methodologies to accelerate progress in autonomous driving.

Speakers

Long Chen — Wayve
Jamie Shotton — Chief Scientist, Wayve
Hongyang Li — Assistant Professor, University of Hong Kong & Research Scientist, Shanghai AI Lab
Nikhil Mohan — Lead Applied Scientist, Wayve
Gianluca Corrado — Principal Applied Scientist, Wayve
Oleg Sinavski — Principal Applied Scientist, Wayve
Elahe Arani — Head of AI Research, Wayve

Talks (7)

00:04:00 — Long Chen: End-to-End Autonomy: A New Era of Self-Driving
- An introduction to the tutorial on end-to-end autonomous driving, highlighting the shift in industry and academia towards end-to-end solutions and outlining the day’s schedule.
03:36:00 — Jamie Shotton: The Road to Embodied AI
- Discusses the rapid progress of AI, the challenges of real-world autonomous driving, and Wayve’s end-to-end approach to embodied AI, emphasizing simulation, multimodality, and foundation models.
06:06:00 — Hongyang Li: Could Foundation Models really resolve End-to-end Autonomy?
- Explores the potential of foundation models to resolve end-to-end autonomy, discussing challenges in data scale, training stability, and the need for robust, interpretable, and efficient systems.
11:18:00 — Nikhil Mohan: Towards a Neural Simulator: Offline evaluation of end-to-end autonomous vehicles
- Presents neural simulators as a solution for offline evaluation of end-to-end autonomous vehicles, detailing the shift from traditional AV stacks to end-to-end AI and the importance of data-driven environment creation for robust testing.
14:20:00 — Gianluca Corrado: Learning Models of the World: Exploring Generative World Models in Autonomous Driving
- Explores the evolution of generative world models from early neural network approaches to modern transformer and diffusion-based models, highlighting their application in autonomous driving for prediction, planning, and data generation.
17:20:00 — Oleg Sinavski: Language Meet Driving: Empowering End-to-End Autonomous Driving with Large Language Models
- Discusses the integration of Large Language Models (LLMs) into end-to-end autonomous driving, highlighting their role in reasoning, explainability, and leveraging compressed information for complex decision-making.
20:20:00 — Elahe Arani: Navigating the Future of End-to-End Autonomous Driving: Reflections and Future Directions
- Provides a comprehensive overview of the current state and future directions of end-to-end autonomous driving, addressing challenges in data scale, safety, efficiency, and the role of foundation models and LLMs.

Methods / Models / Datasets Mentioned

Tesla FSD Beta V12
UniAD
Ghost Gym
PRISM-1
Wayve GAIA (GAIA-1)
LINGO-1
LINGO-2
COMPASS
Gato
LM-Nav
RT-1
Open X-Embodiment
WayveScenes101
VQ-GAN
BLIP-2 Q-Former
Vicuna-7B
LORA finetuning
CLIP
GPT-3
GPT-4
DriveGPT4
LingoQA
Lingo-Judge
LangAuto
LamPilot
DOROTHIE
LMdrive
CarLLaVA
Dreamer (RSSM)
Dreamer v1
Dreamer v2
Dreamer v3
Phenaki
IRIS
Sora
V-JEPA
OccWorld
Copilot4D
Vista
TransFuser
ST-P3
DriveDreamer
GenAD
SubjectDrive
LiDarDM
DriveWM
WoVoGen
Vidar
DriveWorld
MILE
TrafficBots
Drive-WM
MUVO
SEM2
UniWorld
ADriver-I
OccWorld
Think2Drive
WorldDreamer
HighwayEnv
NuScenes (NavSim)
CARLA
Waymo
Argoverse 2
nuPlan
DriveSim
KITTI-360
Waymo Open Dataset
WayveScenes101
OpenPilot
MCTS
GNN
GPT-3.5
Llama
CLIP
Q-Former
Flan-T5
Lingo-S/T
Agents CoDriver
RAG-driver
Nuro
Drive Anywhere
Alexnet
Dropout
Resnets
ImageNet
MNIST
SuperGLUE
VQA
MMLU

Topics

End-to-end autonomous driving · Neural simulators · Generative world models · Large Language Models (LLMs) in driving · Multimodality integration · Explainability and trustworthiness in AI · Foundation models for embodied AI · Data scale and quality for autonomous driving · Safety and reliability in AVs · Future trends and challenges in autonomous driving

Notes

Open for commentary — connections to other work, critiques, follow-up reading.