CVPR 2024 Tutorial: End-to-End Autonomy: A New Era of Self-Driving
Event: CVPR 2024 Tutorial · Duration: 239 min · ▶ Watch on YouTube
Abstract
This tutorial provides a comprehensive overview of end-to-end autonomy in self-driving vehicles, tracing its evolution from traditional modular systems to advanced AI-driven solutions. It delves into the core motivations behind this paradigm shift, highlighting the limitations of rule-based and data-driven approaches in handling the complexity and unpredictability of real-world driving scenarios. The tutorial showcases Wayve’s pioneering work in developing neural simulators like Ghost Gym and PRISM-1, which enable data-driven, scalable, and controllable simulation environments. It also explores the transformative potential of generative world models, such as GAIA-1 and VISTA, for data generation, understanding complex interactions, and robust policy learning. A significant portion is dedicated to the integration of Large Language Models (LLMs) in autonomous driving, emphasizing their role in enhancing explainability, reasoning, and multimodal perception through models like LINGO-1 and LINGO-2. The tutorial concludes by addressing critical challenges related to data scale, efficiency, safety, and regulatory ambiguities, while outlining future trends towards foundation models, zero-shot learning, and the ultimate goal of achieving human-level general intelligence in autonomous systems.
Speakers
- Long Chen — Wayve
- Jamie Shotton — Wayve
- Hongyang Li — Shanghai AI Lab / University of Hong Kong
- Nikhil Mohan — Wayve
- Gianluca Corrado — Wayve
- Oleg Sinavski — Wayve
- Elahe Arani — Wayve
Talks (7)
- 00:00:00 — Long Chen: CVPR 2024 Tutorial: End-to-End Autonomy: A New Era of Self-Driving
- Introduction to the CVPR 2024 tutorial on End-to-End Autonomy, highlighting the shift towards end-to-end solutions in both industry and academia, with a detailed schedule of speakers and topics.
- 00:03:17 — Jamie Shotton: The Road to Embodied AI
- An overview of the accelerating progress in AI, emphasizing the shift towards embodied AI, particularly in autonomous driving, and introducing Wayve’s end-to-end approach to tackle the complexities of real-world driving.
- 00:51:00 — Hongyang Li: Could Foundation Models really resolve End-to-end Autonomy?
- An exploration of whether foundation models can truly resolve end-to-end autonomy, discussing the shift from traditional modular systems to end-to-end solutions, the role of world models, and the challenges of data scale, efficiency, and safety in autonomous driving.
- 01:25:08 — Nikhil Mohan: Towards a Neural Simulator: Offline evaluation of end-to-end autonomous vehicles
- A deep dive into the need for neural simulators for offline evaluation of end-to-end autonomous vehicles, highlighting the challenges of traditional AV stacks and introducing Wayve’s Ghost Gym and PRISM-1 as data-driven, scalable, and controllable simulation platforms.
- 02:05:07 — Gianluca Corrado: Learning Models of the World: Exploring Generative World Models in Autonomous Driving
- An exploration of generative world models in autonomous driving, tracing their evolution from early neural network-based approaches to modern transformer and diffusion models, and highlighting their potential for data generation, world understanding, and robust policy learning.
- 02:43:49 — Oleg Sinavski: Language Meet Driving: Empowering End-to-End Autonomous Driving with Large Language Models
- An exploration of how Large Language Models (LLMs) are empowering end-to-end autonomous driving, focusing on their role in explainability, reasoning, and multimodal integration, and discussing the challenges and future trends in developing robust, efficient, and trustworthy autonomous systems.
- 03:24:19 — Elahe Arani: Navigating the Future of End-to-End Autonomous Driving: Reflections and Future Directions
- A comprehensive overview of the challenges and future trends in end-to-end autonomous driving, emphasizing the need for robust benchmarking, advanced simulation, and efficient, interpretable, and safe systems, while highlighting the potential of foundation models and multimodal integration.
Key Takeaways
- End-to-end autonomy is a paradigm shift in self-driving, moving away from traditional modular systems to integrated AI solutions that leverage raw sensor data for direct control.
- Neural simulators like Ghost Gym and PRISM-1 are crucial for offline evaluation, enabling data-driven, scalable, and controllable testing of autonomous vehicles in complex, dynamic environments.
- Generative world models (e.g., GAIA-1, VISTA) offer significant potential for data generation, understanding complex interactions, and robust policy learning, allowing for the simulation of diverse and challenging scenarios.
- Large Language Models (LLMs) are increasingly integrated into autonomous driving systems to enhance explainability, reasoning, and multimodal perception, fostering trust and enabling more informed decision-making.
- The future of end-to-end autonomous driving lies in the development of foundation models, multimodal integration, and efficient, interpretable, and safe systems that can adapt to novel situations and generalize across diverse environments.
Methods / Models / Datasets Mentioned
Ghost GymPRISM-1UniADGAIA-1VISTALingo-1Lingo-2MCTSGNNGPT-3GPT-3.5GPT-4GenADDriveGPT4RAG-driverLMdriveNuroDrive AnywhereLangPropLaMPilotDOROTHIEHILM-DRSSMDreamer v1Dreamer v2Dreamer v3PhenakiIRISSoraV-JEPAMILESEM2Drive-WMCoplilot4DOccWorldDriveWorldDriveDreamerTrafficBotsPanaceaSubjectDriveLidarDMIso-DreamUniWorldMUVOVIDARWoVoGenThink2DriveDriveAGIDriveLMOpenLaneVIDARELMDriveAdapterMP3CLIPQ-FormerFlan-T5LoRALlamaPID controllerModel Predictive ControlVQ-VAEVQ-GANVIVITCOLMAPNerfstudioNeRFHyperNeRFNerfiesNSFFiPhoneD-NeRFDriveSimCarlaWaabi WorldWaymo's WaymaxnuScenesWaymoArgoverse2nuPlanKITTI-360OpenpilotCNN E2EBDD-VCILRSConditional ILDArBAGILEADSafeDaggerGeneralizationNMPBDD-XPlanTPatch-wise Feature ExtractionMultimodal Foundation ModelPolicy NetworkTransformer BlockST-AdapterVisual EncoderAvgPoolQ-FormerLinearLarge Language ModelEnumeration ModuleIncorporation ModuleHR Spatial ExtractorCross-attentionMLPQuery Detection ModuleLingo-1Lingo-JudgeLingo-2AgentsCoDriverDriveGPT4RAG-driverLMdriveNuroDrive AnywhereLLM-based PlannerLow-level ControllerPIDWayve's Vision ModelVision EncoderVisual TokensPrediction HeadersBEV MapTraffic Light StatesWaypointTarget PointFuture WaypointsMulti-view RGB & LiDARNavigation InstructionAction TokensInstruction FollowingAttention based chain-of-thoughtCarLLaVALlamaWayveScenes101Open Drive LabDriveAGIVQ-VAEVQ-GANVIVITCOLMAPNerfstudioNeRFHyperNeRFNerfiesNSFFD-NeRFDriveDreamerTrafficBotsPanaceaSubjectDriveLidarDMIso-DreamUniWorldMUVOVIDARWoVoGenThink2DriveGenADDriveWorldTesla World ModelSEM2GAIA-1ADrive-ICopilot4DOccWorldDrive-WMVISTANeuRADSoraV-JEPADelphi
Topics
End-to-End Autonomy · Neural Simulators · Generative World Models · Large Language Models (LLMs) · Explainability · Reasoning · Multimodal Integration · Data Scale · Safety and Reliability · Foundation Models
Notes
Open for commentary — connections to other work, critiques, follow-up reading.