Deep Stereo Matching in the Twenties

Event: CVPR 2024 Tutorial · Duration: 176 min · ▶ Watch on YouTube

Abstract

This video segment provides a comprehensive overview of deep stereo matching, focusing on advancements since 2020. It begins with an introduction to the speakers and the tutorial’s schedule, followed by a quick recap of traditional stereo matching pipelines and the paradigm shift brought by deep learning. The core of the segment delves into various deep stereo architectures, categorizing them into CNN-based Cost Volume Aggregation, Iterative/Optimization Inspired, and Transformer-Based approaches. It highlights key models like RAFT-Stereo, CREStereo, STTR, and ELFNet, discussing their novel contributions and performance improvements, particularly in handling domain shifts and fine details. This segment delves into the challenges of domain shift in deep stereo matching, exploring its causes, consequences, and various approaches to mitigate it. It covers zero-shot generalization techniques, including feature modeling, non-parametric cost volumes, and the integration of geometric cues. The discussion then transitions to domain adaptation, differentiating between offline and online strategies, and introduces federated learning as a novel approach for online adaptation in deep stereo. The segment concludes with a comparative analysis of different methods and their effectiveness in improving accuracy and efficiency across diverse domains.

Speakers

Matteo Poggi — University of Bologna
Fabio Tosi — University of Bologna

Talks (4)

00:00:00 — Fabio Tosi: Deep Stereo Matching in the Twenties - Introduction
- Introduction to the tutorial, speakers, schedule, and a recap of stereo matching evolution from traditional methods to deep learning, highlighting recent advancements and challenges like domain shift.
00:14:24 — Fabio Tosi: Deep Stereo Matching in the Twenties - Architectures
- Detailed overview of deep stereo architectures since 2020, categorizing them into CNN-based Cost Volume Aggregation, Iterative/Optimization Inspired (e.g., RAFT-Stereo, CREStereo), and Transformer-Based (e.g., STTR, ELFNet) approaches, discussing their novelties and performance.
01:27:52 — Fabio Tosi: Deep Stereo Matching in the Twenties: Facing Domain-Shifts
- Fabio Tosi discusses domain shift in deep stereo matching, its causes, consequences, and approaches to alleviate it, including zero-shot generalization and domain adaptation.
02:14:17 — Matteo Poggi: Deep Stereo Matching in the Twenties: Facing Domain-Shifts
- Matteo Poggi discusses domain adaptation in deep stereo matching, focusing on offline and online adaptation techniques, including federated learning for stereo.

Key Takeaways

Deep stereo matching has seen significant advancements since 2020, moving beyond traditional hand-crafted algorithms to sophisticated deep learning architectures.
The evolution of architectures includes replacing individual pipeline steps with learnable modules, then adopting end-to-end networks, and more recently, iterative/optimization-inspired and transformer-based designs.
Addressing domain shift (e.g., synthetic to real images) has been a crucial challenge, with domain adaptation strategies and robust network designs leading to substantial performance improvements.
Future directions in stereo matching involve exploring diverse input modalities beyond conventional color cameras, such as cross-spectral and event-based stereo, and tackling complex scenarios like non-Lambertian surfaces.
Domain shift is a significant challenge in deep stereo matching, leading to performance degradation when models are applied to scenarios different from their training domain.
Zero-shot generalization aims to achieve robustness across unseen target domains without prior knowledge, utilizing strategies like domain-agnostic feature modeling and integrating geometric cues.
Domain adaptation techniques, including offline and online methods, focus on adapting source-trained models to target domains using limited data to bridge the domain shift.
Federated learning offers a promising solution for online domain adaptation, allowing multiple clients to collaboratively improve a global model without sharing raw data, thus enhancing accuracy and efficiency across diverse domains.

Methods / Models / Datasets Mentioned

ARStereo
CBMV
CEST
CFNet
CREStereo
CroCo Stereo
DARTS
DLNR
DRR
DispNet
ELFNet
FC-Net
FedFULL
FedMAD
FedStereo
GANet
GC-Net
GCP
GMStereo
ICGNet
IGEV-Stereo
ITSA
LEAStereo
LRCR
LSSI
MAD
MC-CNN
MS-Nets
MiDaS
MoCha-Stereo
NMRF
NeRF-Supervised Deep Stereo
Neural Disparity Refinement
O1
Optimal Transport
PCVNet
PSMNet
RAFT-Stereo
RecResnet
SGM
SGM-Forest
STTR
Selective-Stereo
iResNet

Topics

Cost Volume Aggregation · Deep Learning in Stereo · Deep Stereo Matching · Domain Adaptation · Domain Shift · Federated Learning · Iterative Refinement · Neural Architecture Search · Neural Radiance Fields · Offline Adaptation · Online Adaptation · Stereo Architectures · Transformer Networks · Zero-Shot Generalization

Notes

Open for commentary — connections to other work, critiques, follow-up reading.