CV4MR 2024: 2nd Workshop on Computer Vision for Mixed Reality

Event: CVPR 2024 Workshop · Duration: 294 min · ▶ Watch on YouTube

Abstract

The CV4MR 2024 workshop brought together leading researchers to discuss the latest advancements and challenges in computer vision for mixed reality. Key topics included real-time neural rendering for pass-through experiences, robust 3D scene reconstruction from sparse data, and generative AI for 3D content creation. Speakers presented novel methods for achieving high-fidelity, real-time rendering, addressing issues like temporal consistency and occlusion. The workshop also explored innovative approaches to 3D geometry manipulation using machine learning, focusing on accurate and versatile deformation representations. Furthermore, efficient annotation strategies for multi-object tracking were discussed, highlighting the potential of synthetic pre-training, pseudo-labeling, and active learning to democratize 3D content creation. The event emphasized the need for holistic, systems-level thinking to tackle the complex interdependencies of these technologies in real-world mixed reality applications.

Speakers

Douglas Lanman — Reality Labs, Meta
Nima Kalantari — TAMU
Federico Tombari — Google
Lei Xiao — Meta
Natalia Neverova — Meta
Noam Aigerman — University of Montreal
Laura Leal-Taixé — Nvidia

Talks (7)

00:01:59 — Douglas Lanman: Taking a Small Step in a Different Direction
- This keynote discusses the challenges and opportunities in mixed reality, emphasizing the importance of pass-through technology and the need for a holistic systems-level approach to computer vision problems in this domain.
01:55:00 — Nima Kalantari: Reconstructing 3D Scenes from Sparse Images
- This talk presents a novel approach to 3D scene reconstruction and novel view synthesis from extremely sparse input images, focusing on generating view-dependent highlights and addressing the limitations of existing NeRF-based methods.
02:22:58 — Federico Tombari: RadSplat: Radiance Field-Informed Gaussian Splatting for Robust Real-Time Rendering with 900+ FPS
- This presentation introduces RadSplat, a method combining neural radiance fields and 3D Gaussian Splatting to achieve robust, real-time rendering with high frame rates and quality, even for challenging real-world captures.
02:37:59 — Lei Xiao: Exploring Neural Rendering for Mixed Reality
- This talk explores the application of neural rendering for mixed reality, focusing on challenges like pass-through rendering, dynamic scene reconstruction, stylization, and real-time editing, and presents solutions using novel techniques like Gaussian Splatting and neural implicit representations.
02:56:04 — Natalia Neverova: Generative AI for 3D content creation Meta 3D Gen
- This presentation introduces Meta 3D Gen, a suite of feedforward foundational models and capabilities for text-guided 3D content creation, focusing on generating high-quality 3D assets and textures from text prompts, and addressing the challenges of scalability and realism.
03:49:57 — Noam Aigerman: Manipulating 3D geometry with machine learning
- This talk explores the use of machine learning for manipulating 3D geometry, focusing on deformation representations that are both accurate and versatile, and introduces novel techniques like Neural Jacobian Fields and Injective Flows to achieve high-quality, detail-preserving deformations.
04:19:29 — Laura Leal-Taixé: Efficient Annotation for the Trackers of Tomorrow
- This presentation introduces a novel approach to efficient annotation for multi-object tracking, leveraging synthetic pre-training, pseudo-labeling, and active learning within a hierarchical graph-based framework to significantly reduce human labeling effort while achieving state-of-the-art performance.

Key Takeaways

Mixed Reality (MR) applications demand real-time, high-fidelity rendering and robust 3D scene understanding, posing significant challenges for current computer vision techniques.
Neural rendering, particularly methods combining NeRFs with Gaussian Splatting, shows promise for achieving real-time, high-quality pass-through experiences in MR, but requires addressing issues like temporal consistency, occlusion, and memory efficiency.
Generative AI, especially text-to-3D models, is crucial for democratizing 3D content creation, enabling users to generate complex and realistic 3D assets and textures from simple text prompts.
Manipulating 3D geometry with machine learning, through accurate and versatile deformation representations, is essential for creating dynamic and interactive virtual environments.
Efficient annotation strategies, leveraging synthetic pre-training, pseudo-labeling, and active learning, are vital for reducing the human labeling effort required to train robust multi-object trackers for dynamic scenes.

Methods / Models / Datasets Mentioned

Apple Vision Pro
Quest 3
NeRF (Neural Radiance Fields)
Gaussian Splatting (3DGS)
RadSplat
OpenNeRF
Mask3D
DreamFusion
Text2Mesh
Neural Cages
Neural Jacobian Fields
TutteNet
DeepPhase
DeepMetaHandles
RigNet
CLIP
ALIGN
LSeg
OpenSeg
SPAM (Strong and Unified Scalable Hierarchical Multi-object Tracker)
ByteTrack
TrackFormer
MOTR
FairMOT
QDTrack
UTM
MotionTrack
SUSHI
FlipNeRF
FreeNeRF
SparseNeRF
VolSDF (Volumetric Signed Distance Fields)
SDS (Score Distillation Sampling)

Topics

Mixed Reality · Computer Vision · Neural Rendering · 3D Reconstruction · Generative AI · Pass-through Technology · Multi-Object Tracking · 3D Geometry Manipulation · Deformation Representation · Synthetic Pre-training · Pseudo-labeling · Active Learning

Notes

Open for commentary — connections to other work, critiques, follow-up reading.