Embodied Intelligence for Autonomous Systems on the Horizon

Event: CVPR 2025 Workshop on Embodied Intelligence for Autonomous Systems · Duration: 0 min · ▶ Watch on YouTube

Abstract

This workshop session explores two key areas in embodied intelligence for autonomous systems: achieving efficiency in self-driving with large models and learning robust reward functions. The first part introduces ETA, a dual-model approach that leverages ‘thinking ahead’ with a large model and reactive decisions with a small model, demonstrating near real-time performance. The second part presents RELACS, a novel reward learning framework that uses counterfactual data to train a dedicated reward model, enabling better generalization and addressing the complexities of reward engineering in autonomous driving.

Speakers

  • Fatma Güney — Koç University, Istanbul, Turkey

Talks (2)

  • 00:26Fatma Güney: ETA: Efficiency through Thinking Ahead - A Dual Approach to Self-Driving with Large Models
    • This talk introduces ETA, a dual-model approach that combines a slow, high-performing large model with a fast, small model and a forecasting module to achieve real-time, efficient self-driving.
  • 01:03:00Fatma Güney: RELACS: Reward Learning for Autonomous Driving using Counterfactuals
    • This talk presents RELACS, a dedicated reward model that learns driving-related rewards directly from video observations and counterfactual data, addressing the challenges of hand-designed rewards and improving generalization to real-world scenarios.

Key Takeaways

  • Large models can be efficiently integrated into self-driving systems by employing a dual-model architecture that separates proactive ‘thinking ahead’ from reactive decision-making.
  • Forecasting future states and maintaining real-time reactivity through a smaller model are critical for achieving both high performance and low latency in autonomous driving.
  • Learning reward functions directly from observations, especially using counterfactual data, can overcome the challenges of manual reward engineering and improve generalization to diverse real-world scenarios.
  • Dedicated reward models, independent of future prediction, can accurately assess driving quality, even in uncertain situations, by learning to distinguish between expert, near-crash, and crash behaviors.
  • The choice of tokenizer and its training data diversity significantly impacts the generalization capabilities of reward models to out-of-distribution real-world driving videos.

Methods / Models / Datasets Mentioned

  • DriveMLM
  • LLM4AD
  • CarLLaVA
  • AD-MLP
  • UniAD-Base
  • VAD
  • TCP-traj
  • ThinkTwice
  • DriveTransformer
  • DriveAdapter
  • Daniel Kahneman's 'Thinking, Fast and Slow'
  • RoboDual
  • Hi Robot
  • LeapAD
  • Gigaflow
  • NAvSim
  • Implicit Affordances
  • ROACH
  • VIPER
  • Vista
  • Diffusion Models
  • Cosmos-Tokenizer
  • DINOv2

Topics

Large Language Models (LLMs) · Multimodal Large Language Models (MLLMs) · Self-driving · Real-time performance · Efficiency · Dual-model approach · Forecasting · Reward learning · Counterfactuals · Generalization · Out-of-distribution · Latency · Ablation studies


Notes

Open for commentary — connections to other work, critiques, follow-up reading.