Embodied Intelligence for Autonomous Systems on the Horizon

Event: CVPR 2025 Workshop on Embodied Intelligence for Autonomous Systems · Duration: 0 min · ▶ Watch on YouTube

Abstract

This workshop session explores two key areas in embodied intelligence for autonomous systems: achieving efficiency in self-driving with large models and learning robust reward functions. The first part introduces ETA, a dual-model approach that leverages ‘thinking ahead’ with a large model and reactive decisions with a small model, demonstrating near real-time performance. The second part presents RELACS, a novel reward learning framework that uses counterfactual data to train a dedicated reward model, enabling better generalization and addressing the complexities of reward engineering in autonomous driving.

Speakers

Fatma Güney — Koç University, Istanbul, Turkey

Talks (2)

00:26 — Fatma Güney: ETA: Efficiency through Thinking Ahead - A Dual Approach to Self-Driving with Large Models
- This talk introduces ETA, a dual-model approach that combines a slow, high-performing large model with a fast, small model and a forecasting module to achieve real-time, efficient self-driving.
01:03:00 — Fatma Güney: RELACS: Reward Learning for Autonomous Driving using Counterfactuals
- This talk presents RELACS, a dedicated reward model that learns driving-related rewards directly from video observations and counterfactual data, addressing the challenges of hand-designed rewards and improving generalization to real-world scenarios.

Key Takeaways

Large models can be efficiently integrated into self-driving systems by employing a dual-model architecture that separates proactive ‘thinking ahead’ from reactive decision-making.
Forecasting future states and maintaining real-time reactivity through a smaller model are critical for achieving both high performance and low latency in autonomous driving.
Learning reward functions directly from observations, especially using counterfactual data, can overcome the challenges of manual reward engineering and improve generalization to diverse real-world scenarios.
Dedicated reward models, independent of future prediction, can accurately assess driving quality, even in uncertain situations, by learning to distinguish between expert, near-crash, and crash behaviors.
The choice of tokenizer and its training data diversity significantly impacts the generalization capabilities of reward models to out-of-distribution real-world driving videos.

Methods / Models / Datasets Mentioned

DriveMLM
LLM4AD
CarLLaVA
AD-MLP
UniAD-Base
VAD
TCP-traj
ThinkTwice
DriveTransformer
DriveAdapter
Daniel Kahneman's 'Thinking, Fast and Slow'
RoboDual
Hi Robot
LeapAD
Gigaflow
NAvSim
Implicit Affordances
ROACH
VIPER
Vista
Diffusion Models
Cosmos-Tokenizer
DINOv2

Topics

Large Language Models (LLMs) · Multimodal Large Language Models (MLLMs) · Self-driving · Real-time performance · Efficiency · Dual-model approach · Forecasting · Reward learning · Counterfactuals · Generalization · Out-of-distribution · Latency · Ablation studies

Notes

Open for commentary — connections to other work, critiques, follow-up reading.