The 3rd Monocular Depth Estimation Challenge
Event: CVPR 2024 Workshop - 3rd Monocular Depth Estimation Challenge (MDEC) · Duration: 245 min · ▶ Watch on YouTube
Abstract
This workshop presents the 3rd Monocular Depth Estimation Challenge, bringing together leading researchers to discuss the latest advancements and challenges in the field. Talks cover the historical evolution of monocular depth estimation, from early self-supervised methods to modern foundational models, and explore their applications in areas like augmented reality and automated driving. Key discussions include strategies for achieving scale-aware metric depth, leveraging multi-frame information, and the development of novel architectures like Depth Field Networks. The challenge results highlight the impact of high-quality data, effective fine-tuning strategies, and the integration of geometric priors. The workshop also delves into the importance of robust evaluation metrics and the potential of implicit learning for scene representation, pushing the boundaries of zero-shot generalization and real-time performance.
Speakers
- Ripudaman Singh Arora — Blue River Technology
- Matteo Poggi — University of Bologna
- Vítor Guizilini — Toyota Research Institute
- Eric Brachmann — Niantic
- Mykola Lavreniuk — Team EVP++
- Guangyuan Zhou — PICO-MR
- Aradhye Agarwal — Indian Institute of Technology Delhi
- James Elder — York University
Talks (8)
- 00:00:00 — Ripudaman Singh Arora: The 3rd Monocular Depth Estimation Challenge (Introduction)
- Introduction to the workshop, its goals, the organizing committee, and the importance of monocular depth estimation in robotics and computer vision.
- 02:09:00 — Matteo Poggi: Monocular Depth Estimation: Are We Done?
- A comprehensive overview of the evolution of monocular depth estimation, from early self-supervised methods to modern foundational models, highlighting current limitations and future challenges.
- 05:52:00 — Vítor Guizilini: An ODE to MonODEpth
- Discusses advancements in monocular depth estimation, including self-supervised methods, scale-aware metric depth, multi-frame depth, and depth field networks, highlighting challenges and future directions.
- 09:20:00 — Eric Brachmann: Metric Depth for Instant AR
- Explores the use of metric depth estimation for instant augmented reality (AR), addressing challenges like scale ambiguity, dynamic objects, and the need for robust pose estimation, introducing a new dataset and a workshop challenge.
- 10:45:00 — Mykola Lavreniuk: 3rd Monocular Depth Estimation Challenge @ CVPR24 (EVP++ solution)
- Presents the EVP++ solution for the Monocular Depth Estimation Challenge, utilizing diffusion-based models, automatic image captioning with BLIP-2, and a novel inverse multi-attentive feature alignment module for improved accuracy.
- 10:53:00 — Guangyuan Zhou: High Quality Data makes great progress
- Introduces the PICO-MR method, which leverages high-quality data selection and a refined Depth-Anything model for improved monocular depth estimation, addressing challenges in diverse scenes and camera differences.
- 11:00:00 — Aradhye Agarwal: The 3rd Monocular Depth Estimation Challenge (visioniitd solution)
- Presents the visioniitd solution for the Monocular Depth Estimation Challenge, which utilizes a ViT-based architecture with MLPs for scene embedding, aligning to CLIP space without intermediate text, and achieving strong zero-shot generalization.
- 11:06:00 — James Elder: Ground Theory of Metric Monodepth
- Proposes a ground theory approach to monocular depth estimation, leveraging semantic segmentation and geometric priors to infer depth without explicit learning, demonstrating surprisingly good metric depth maps.
Key Takeaways
- Foundational models like Depth Anything, trained on vast datasets, significantly advance monocular depth estimation, but challenges remain with non-Lambertian surfaces and extreme viewpoints.
- Integrating geometric priors and semantic information, even without direct depth supervision, can lead to surprisingly accurate metric depth maps and improved generalization across diverse scenes and camera models.
- Novel architectures leveraging transformers, diffusion models, and implicit scene representations are crucial for achieving scale-aware metric depth, multi-frame consistency, and robust pose estimation in complex real-world scenarios like AR and autonomous driving.
- The development of high-quality, diverse datasets and robust evaluation metrics, including 3D point cloud-based metrics and depth boundary metrics, is essential for fair benchmarking and driving progress in the field.
- End-to-end differentiable pipelines that integrate feature extraction, matching, and pose optimization, combined with multi-stage curriculum learning and self-calibration techniques, show promise for overcoming limitations in traditional methods and achieving real-time performance.
Methods / Models / Datasets Mentioned
SfMLearnerDepth AnythingDepth Anything v2MarigoldChronoDepthDepth4TOMPackNetMetric Velocity SupervisionDense Depth for Automated Driving (DDAD)Pseudo-LidarDD3DDD3D v2Self-Supervised Scene FlowTactile SensorsDepth Field NetworksEquivariant Perceiver IODeLIRAScale-Aware Metric DepthSuperGlueDPT (Dense Prediction Transformer)MickeyLoFTRRANSACKabschACE (Accelerated Coordinate Encoding)ACE ZeroACE RelocalizerACE Zero RelocalizerEVP++ZoeDepthVPDBLIP-2CLIPIMAFA (Inverse Multi-Attentive Feature Alignment)PICO-MRBEiT384-LMetric3DMiDaSLeResArgoVerseDsecNYUv2KITTICityscapesDiodeSGM+LidarKinectCity_TartanCity_KITTIViTMLPsGround TheoryInternImageADE20KMCMLSDFRPixelFormerMIMAiTElder Lab (Segmentation net + statistical models)ReadingLS (SwiftDepth)FRDC-SHHIT-AIIA3DCreatorsRGA-RobotsSuperPoint
Topics
Monocular Depth Estimation · Self-Supervised Learning · Foundational Models · Scale-Aware Metric Depth · Multi-Frame Depth Estimation · Depth Field Networks · Augmented Reality (AR) · Automated Driving · Semantic Segmentation · Geometric Priors · Zero-Shot Generalization · Benchmarking & Evaluation Metrics · Diffusion Models · Camera Self-Calibration
Notes
Open for commentary — connections to other work, critiques, follow-up reading.