OmniCV 2024 Workshop

Event: CVPR OmniCV 2024 · Duration: 186 min · ▶ Watch on YouTube

Abstract

The OmniCV 2024 workshop explores cutting-edge advancements in omnidirectional computer vision, focusing on bridging the gap between theoretical research and practical applications. Presentations highlight the increasing relevance of 360-degree cameras in fields like autonomous navigation, virtual reality, and 3D scene understanding. Topics range from innovative data collection methods through crowdsourcing and synthetic dataset generation to novel deep learning architectures designed to handle the unique geometric challenges of omnidirectional imagery, particularly distortion. The workshop emphasizes the development of robust algorithms for tasks such as depth and normal estimation, semantic segmentation, and object detection, showcasing efforts to improve accuracy and enable real-world deployment.

Speakers

  • Swetpriy — Unknown (Introducer)
  • Jan Erik Solem — Chairman at Overture Maps Foundation & Engineering Director, Meta
  • Jean-François Lalonde — Distinguished Full Professor, Electrical and Computer Engineering Department, Université Laval
  • Hannes Reichert — TH Aschaffenburg University of Applied Sciences
  • Jingguo Liu — Southwest University, Chongqing, China
  • Cing-Jia Lin — National Tsing Hua University, Taiwan
  • Jay Bhanushali — Indian Institute of Technology Madras, India
  • Stefan Milz — Spleenlab GmbH
  • Lars Hinneburg — Spleenlab GmbH

Talks (7)

  • 00:00:00 — Swetpriy: Introduction
    • Introduces the OmniCV 2024 workshop, outlining its objectives to foster novel research in omnidirectional computer vision, bridge the gap between research and application, and encourage new models for diverse applications like autonomous driving and VR/AR, while acknowledging sponsors Qualcomm and SpleenLab.
  • 02:46:50Jan Erik Solem: Crowdsourcing Imagery for Mapping and Research
    • Presents Mapillary, a crowdsourcing platform for street-level imagery, detailing its use for mapping, dataset creation (e.g., Metropolis, Planet-Scale Depth), and research applications including panoptic segmentation and Neural Radiance Fields (NeRFs) for immersive experiences.
  • 05:50:00Jean-François Lalonde: Adapting to wide-angle lenses with distortion-aware transformers
    • Discusses challenges of distortion in wide-angle and panoramic images and proposes a novel transformer-based architecture that adapts its internal structure to the radial nature of lens distortions for improved performance in various computer vision tasks.
  • 06:19:00Jingguo Liu: Estimating Depth of Monocular Panoramic Image with Teacher-Student Model Fusing Equirectangular and Spherical Representations
    • Introduces a teacher-student model for monocular panoramic depth estimation that fuses equirectangular and spherical representations, leveraging spherical convolution to overcome distortion issues and improve accuracy compared to traditional methods.
  • 06:36:00Cing-Jia Lin: DQ-HorizonNet: Enhancing Door Detection Accuracy in Panoramic Images via Dynamic Quantization
    • Addresses challenges in door detection from panoramic images, such as distortion, cross-boundary issues, and size variance, by proposing DQ-HorizonNet, which uses dynamic quantization to effectively reduce false positives and improve detection accuracy.
  • 06:55:00Jay Bhanushali: Cross-Domain Synthetic-to-Real In-the-Wild Depth and Normal Estimation for 3D Scene Understanding
    • Presents OmniHorizon, a synthetic omnidirectional dataset, and UbotNet, a U-Net and Bottleneck Transformer fusion architecture, to improve depth and normal estimation in real-world outdoor scenes through cross-domain synthetic-to-real transfer.
  • 07:15:00Stefan Milz: Omni-Balloon Visual Odometry Challenge
    • Introduces the Omni-Balloon Visual Odometry Challenge on Kaggle, motivated by the need for robust drone navigation without GPS, using omnidirectional camera data from a balloon-mounted system inspired by the East German balloon escape.

Key Takeaways

  • Omnidirectional computer vision is a rapidly evolving field with diverse applications, benefiting from advancements in camera technology and AI algorithms.
  • Crowdsourcing platforms like Mapillary and synthetic dataset generation (e.g., OmniHorizon) are crucial for creating large-scale, diverse omnidirectional datasets needed for robust model training.
  • Novel deep learning architectures, such as distortion-aware transformers and U-Net fusions with bottleneck transformers, are being developed to effectively handle geometric distortions and leverage global context in panoramic imagery.
  • Sim-to-real transfer techniques, involving pretraining on synthetic data and fine-tuning on real-world datasets, significantly improve performance in tasks like depth and normal estimation for outdoor scenes.
  • Challenges in panoramic image analysis, including distortion, cross-boundary objects, and size variance, are being addressed through innovative approaches like dynamic quantization and specialized spherical convolution methods.

Methods / Models / Datasets Mentioned

  • Mapillary
  • Overture Maps Foundation
  • Ricoh Theta
  • Insta360
  • Trimble
  • Mapillary CrowdDriven Dataset
  • Mapillary Metropolis Dataset
  • Mapillary Planet-Scale Depth Dataset
  • Mapillary Street-Level Sequences Dataset
  • Mapillary Traffic Sign Dataset
  • Mapillary Vistas Dataset
  • Panoptic Segmentation (Porzi et al. CVPR'21)
  • Neural Radiance Fields (NeRFs)
  • Spherical Distortion Model
  • DarSwin (Distortion-aware Radial Swin Transformer)
  • Swin Transformer
  • Deformable Swin (DAT)
  • U-Net (Ronneberger et al. 2015)
  • HorizonNet (Cheng Sun et al. 2019)
  • DQ-HorizonNet
  • Faster RCNN (R. Girshick et al. 2014)
  • YOLO (J. Redmon et al. 2016)
  • DETR (N. Carion et al. 2020)
  • OmniHorizon
  • UbotNet
  • Bottleneck Transformer (Srinivas et al. 2021)
  • RectNet
  • UResNet
  • UNet128
  • UbotNet Lite
  • Matterport3D 360°
  • Replica 360° 2K/4K RGBD
  • Stanford 2D-3D
  • Pano3D (Albanis et al. 2021)
  • Zillow
  • Fukuoka dataset (Mozos et al. 2019)
  • Reverse-Huber function
  • L1 penalty
  • Burhu loss
  • Cubemap representation (Cheng et al. 2018)
  • Tangent representation (Eder et al. 2019)
  • Spherical representation (Jung et al. 2019)
  • Knowledge Distillation (Ahn et al. 2019)
  • SensorConv

Topics

Omnidirectional Vision · Panoramic Imaging · Depth Estimation · Normal Estimation · Semantic Segmentation · Image Distortion · Vision Transformers · Synthetic Data · Sim-to-Real Transfer · Visual Odometry


Notes

Open for commentary — connections to other work, critiques, follow-up reading.