OmniCV 2024 Workshop

Event: CVPR OmniCV 2024 · Duration: 186 min · ▶ Watch on YouTube

Abstract

The OmniCV 2024 workshop explores cutting-edge advancements in omnidirectional computer vision, focusing on bridging the gap between theoretical research and practical applications. Presentations highlight the increasing relevance of 360-degree cameras in fields like autonomous navigation, virtual reality, and 3D scene understanding. Topics range from innovative data collection methods through crowdsourcing and synthetic dataset generation to novel deep learning architectures designed to handle the unique geometric challenges of omnidirectional imagery, particularly distortion. The workshop emphasizes the development of robust algorithms for tasks such as depth and normal estimation, semantic segmentation, and object detection, showcasing efforts to improve accuracy and enable real-world deployment.

Speakers

Swetpriy — Unknown (Introducer)
Jan Erik Solem — Chairman at Overture Maps Foundation & Engineering Director, Meta
Jean-François Lalonde — Distinguished Full Professor, Electrical and Computer Engineering Department, Université Laval
Hannes Reichert — TH Aschaffenburg University of Applied Sciences
Jingguo Liu — Southwest University, Chongqing, China
Cing-Jia Lin — National Tsing Hua University, Taiwan
Jay Bhanushali — Indian Institute of Technology Madras, India
Stefan Milz — Spleenlab GmbH
Lars Hinneburg — Spleenlab GmbH

Talks (7)

00:00:00 — Swetpriy: Introduction
- Introduces the OmniCV 2024 workshop, outlining its objectives to foster novel research in omnidirectional computer vision, bridge the gap between research and application, and encourage new models for diverse applications like autonomous driving and VR/AR, while acknowledging sponsors Qualcomm and SpleenLab.
02:46:50 — Jan Erik Solem: Crowdsourcing Imagery for Mapping and Research
- Presents Mapillary, a crowdsourcing platform for street-level imagery, detailing its use for mapping, dataset creation (e.g., Metropolis, Planet-Scale Depth), and research applications including panoptic segmentation and Neural Radiance Fields (NeRFs) for immersive experiences.
05:50:00 — Jean-François Lalonde: Adapting to wide-angle lenses with distortion-aware transformers
- Discusses challenges of distortion in wide-angle and panoramic images and proposes a novel transformer-based architecture that adapts its internal structure to the radial nature of lens distortions for improved performance in various computer vision tasks.
06:19:00 — Jingguo Liu: Estimating Depth of Monocular Panoramic Image with Teacher-Student Model Fusing Equirectangular and Spherical Representations
- Introduces a teacher-student model for monocular panoramic depth estimation that fuses equirectangular and spherical representations, leveraging spherical convolution to overcome distortion issues and improve accuracy compared to traditional methods.
06:36:00 — Cing-Jia Lin: DQ-HorizonNet: Enhancing Door Detection Accuracy in Panoramic Images via Dynamic Quantization
- Addresses challenges in door detection from panoramic images, such as distortion, cross-boundary issues, and size variance, by proposing DQ-HorizonNet, which uses dynamic quantization to effectively reduce false positives and improve detection accuracy.
06:55:00 — Jay Bhanushali: Cross-Domain Synthetic-to-Real In-the-Wild Depth and Normal Estimation for 3D Scene Understanding
- Presents OmniHorizon, a synthetic omnidirectional dataset, and UbotNet, a U-Net and Bottleneck Transformer fusion architecture, to improve depth and normal estimation in real-world outdoor scenes through cross-domain synthetic-to-real transfer.
07:15:00 — Stefan Milz: Omni-Balloon Visual Odometry Challenge
- Introduces the Omni-Balloon Visual Odometry Challenge on Kaggle, motivated by the need for robust drone navigation without GPS, using omnidirectional camera data from a balloon-mounted system inspired by the East German balloon escape.

Key Takeaways

Omnidirectional computer vision is a rapidly evolving field with diverse applications, benefiting from advancements in camera technology and AI algorithms.
Crowdsourcing platforms like Mapillary and synthetic dataset generation (e.g., OmniHorizon) are crucial for creating large-scale, diverse omnidirectional datasets needed for robust model training.
Novel deep learning architectures, such as distortion-aware transformers and U-Net fusions with bottleneck transformers, are being developed to effectively handle geometric distortions and leverage global context in panoramic imagery.
Sim-to-real transfer techniques, involving pretraining on synthetic data and fine-tuning on real-world datasets, significantly improve performance in tasks like depth and normal estimation for outdoor scenes.
Challenges in panoramic image analysis, including distortion, cross-boundary objects, and size variance, are being addressed through innovative approaches like dynamic quantization and specialized spherical convolution methods.

Methods / Models / Datasets Mentioned

Mapillary
Overture Maps Foundation
Ricoh Theta
Insta360
Trimble
Mapillary CrowdDriven Dataset
Mapillary Metropolis Dataset
Mapillary Planet-Scale Depth Dataset
Mapillary Street-Level Sequences Dataset
Mapillary Traffic Sign Dataset
Mapillary Vistas Dataset
Panoptic Segmentation (Porzi et al. CVPR'21)
Neural Radiance Fields (NeRFs)
Spherical Distortion Model
DarSwin (Distortion-aware Radial Swin Transformer)
Swin Transformer
Deformable Swin (DAT)
U-Net (Ronneberger et al. 2015)
HorizonNet (Cheng Sun et al. 2019)
DQ-HorizonNet
Faster RCNN (R. Girshick et al. 2014)
YOLO (J. Redmon et al. 2016)
DETR (N. Carion et al. 2020)
OmniHorizon
UbotNet
Bottleneck Transformer (Srinivas et al. 2021)
RectNet
UResNet
UNet128
UbotNet Lite
Matterport3D 360°
Replica 360° 2K/4K RGBD
Stanford 2D-3D
Pano3D (Albanis et al. 2021)
Zillow
Fukuoka dataset (Mozos et al. 2019)
Reverse-Huber function
L1 penalty
Burhu loss
Cubemap representation (Cheng et al. 2018)
Tangent representation (Eder et al. 2019)
Spherical representation (Jung et al. 2019)
Knowledge Distillation (Ahn et al. 2019)
SensorConv

Topics

Omnidirectional Vision · Panoramic Imaging · Depth Estimation · Normal Estimation · Semantic Segmentation · Image Distortion · Vision Transformers · Synthetic Data · Sim-to-Real Transfer · Visual Odometry

Notes

Open for commentary — connections to other work, critiques, follow-up reading.