CV4MR 2024: 2nd Workshop on Computer Vision for Mixed Reality

Event: CVPR 2024 Workshop · Duration: 294 min · ▶ Watch on YouTube

Abstract

The CV4MR 2024 workshop brought together leading researchers to discuss the latest advancements and challenges in computer vision for mixed reality. Key topics included real-time neural rendering for pass-through experiences, robust 3D scene reconstruction from sparse data, and generative AI for 3D content creation. Speakers presented novel methods for achieving high-fidelity, real-time rendering, addressing issues like temporal consistency and occlusion. The workshop also explored innovative approaches to 3D geometry manipulation using machine learning, focusing on accurate and versatile deformation representations. Furthermore, efficient annotation strategies for multi-object tracking were discussed, highlighting the potential of synthetic pre-training, pseudo-labeling, and active learning to democratize 3D content creation. The event emphasized the need for holistic, systems-level thinking to tackle the complex interdependencies of these technologies in real-world mixed reality applications.

Speakers

  • Douglas Lanman — Reality Labs, Meta
  • Nima Kalantari — TAMU
  • Federico Tombari — Google
  • Lei Xiao — Meta
  • Natalia Neverova — Meta
  • Noam Aigerman — University of Montreal
  • Laura Leal-Taixé — Nvidia

Talks (7)

  • 00:01:59Douglas Lanman: Taking a Small Step in a Different Direction
    • This keynote discusses the challenges and opportunities in mixed reality, emphasizing the importance of pass-through technology and the need for a holistic systems-level approach to computer vision problems in this domain.
  • 01:55:00Nima Kalantari: Reconstructing 3D Scenes from Sparse Images
    • This talk presents a novel approach to 3D scene reconstruction and novel view synthesis from extremely sparse input images, focusing on generating view-dependent highlights and addressing the limitations of existing NeRF-based methods.
  • 02:22:58Federico Tombari: RadSplat: Radiance Field-Informed Gaussian Splatting for Robust Real-Time Rendering with 900+ FPS
    • This presentation introduces RadSplat, a method combining neural radiance fields and 3D Gaussian Splatting to achieve robust, real-time rendering with high frame rates and quality, even for challenging real-world captures.
  • 02:37:59Lei Xiao: Exploring Neural Rendering for Mixed Reality
    • This talk explores the application of neural rendering for mixed reality, focusing on challenges like pass-through rendering, dynamic scene reconstruction, stylization, and real-time editing, and presents solutions using novel techniques like Gaussian Splatting and neural implicit representations.
  • 02:56:04Natalia Neverova: Generative AI for 3D content creation Meta 3D Gen
    • This presentation introduces Meta 3D Gen, a suite of feedforward foundational models and capabilities for text-guided 3D content creation, focusing on generating high-quality 3D assets and textures from text prompts, and addressing the challenges of scalability and realism.
  • 03:49:57Noam Aigerman: Manipulating 3D geometry with machine learning
    • This talk explores the use of machine learning for manipulating 3D geometry, focusing on deformation representations that are both accurate and versatile, and introduces novel techniques like Neural Jacobian Fields and Injective Flows to achieve high-quality, detail-preserving deformations.
  • 04:19:29Laura Leal-Taixé: Efficient Annotation for the Trackers of Tomorrow
    • This presentation introduces a novel approach to efficient annotation for multi-object tracking, leveraging synthetic pre-training, pseudo-labeling, and active learning within a hierarchical graph-based framework to significantly reduce human labeling effort while achieving state-of-the-art performance.

Key Takeaways

  • Mixed Reality (MR) applications demand real-time, high-fidelity rendering and robust 3D scene understanding, posing significant challenges for current computer vision techniques.
  • Neural rendering, particularly methods combining NeRFs with Gaussian Splatting, shows promise for achieving real-time, high-quality pass-through experiences in MR, but requires addressing issues like temporal consistency, occlusion, and memory efficiency.
  • Generative AI, especially text-to-3D models, is crucial for democratizing 3D content creation, enabling users to generate complex and realistic 3D assets and textures from simple text prompts.
  • Manipulating 3D geometry with machine learning, through accurate and versatile deformation representations, is essential for creating dynamic and interactive virtual environments.
  • Efficient annotation strategies, leveraging synthetic pre-training, pseudo-labeling, and active learning, are vital for reducing the human labeling effort required to train robust multi-object trackers for dynamic scenes.

Methods / Models / Datasets Mentioned

  • Apple Vision Pro
  • Quest 3
  • NeRF (Neural Radiance Fields)
  • Gaussian Splatting (3DGS)
  • RadSplat
  • OpenNeRF
  • Mask3D
  • DreamFusion
  • Text2Mesh
  • Neural Cages
  • Neural Jacobian Fields
  • TutteNet
  • DeepPhase
  • DeepMetaHandles
  • RigNet
  • CLIP
  • ALIGN
  • LSeg
  • OpenSeg
  • SPAM (Strong and Unified Scalable Hierarchical Multi-object Tracker)
  • ByteTrack
  • TrackFormer
  • MOTR
  • FairMOT
  • QDTrack
  • UTM
  • MotionTrack
  • SUSHI
  • FlipNeRF
  • FreeNeRF
  • SparseNeRF
  • VolSDF (Volumetric Signed Distance Fields)
  • SDS (Score Distillation Sampling)

Topics

Mixed Reality · Computer Vision · Neural Rendering · 3D Reconstruction · Generative AI · Pass-through Technology · Multi-Object Tracking · 3D Geometry Manipulation · Deformation Representation · Synthetic Pre-training · Pseudo-labeling · Active Learning


Notes

Open for commentary — connections to other work, critiques, follow-up reading.