Mobile Intelligent Photography and Imaging

Event: CVPR 2024 Workshop · Duration: 213 min · ▶ Watch on YouTube

Abstract

This segment introduces the CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) workshop, detailing its objectives and presenting the results of its challenge tracks. It features a keynote address by Lei Zhang, who explores the current state and future challenges of image restoration, particularly focusing on the “Restore Any Image Model (RAIM)” concept and the performance of various deep learning architectures on real-world image degradations. The segment concludes with the beginning of Mahmoud Affifi’s keynote on image white balancing. This segment features two talks. The first talk delves into in-camera auto white-balance correction, explaining the underlying principles of color perception, the camera ISP pipeline, and various illuminant estimation techniques. It highlights challenges in generalizing these methods across different camera models and explores post-capture white-balance editing. The second talk shifts focus to Snap Inc.’s efforts in advancing camera technology for mobile applications and AR glasses. It covers a range of topics including image quality assessment, face and scene restoration, segmentation, and video enhancement, with a particular emphasis on personalized portrait enhancement through dual-pivot tuning. This segment introduces passive ultra-wideband single-photon imaging, a novel technique capable of capturing events across 12 orders of magnitude in time, from seconds to picoseconds. The speaker explains the underlying principles of photon-counting and flux probing theory, demonstrating how continuous light intensity can be reconstructed from discrete photon arrival times. Preliminary demonstrations showcase the technology’s ability to act as a “microscope for time,” resolving various phenomena at different timescales, from fan motion to individual laser pulses, and enabling passive ultra-wideband videography. The talk concludes by outlining future directions in computer vision, scientific imaging, and signal processing, highlighting the potential for real-time applications and the use of emerging single-photon devices.

Speakers

Chen Change Loy — Nanyang Technological University
Lei Zhang — The Hong Kong Polytechnic University & OPPO Research Institute
Mahmoud Affifi — Google
Mahmoud Afifi
Jian Wang — Staff Research Scientist, Snap Research
Mian Wei — Department of Computer Science, University of Toronto, Canada

Talks (10)

00:00:00 — Chen Change Loy: Welcome and Challenge Results
- An introduction to the Mobile Intelligent Photography and Imaging (MIPI) workshop at CVPR 2024, including an overview of its goals, statistics from the challenge tracks, and the announcement of winners for RAW Image Denoising, Demosaic for HybridEVS Camera, and Nighttime Flare Removal.
00:10:50 — Lei Zhang: How Far Are We From the Restore Any Image Model (RAIM)?
- This talk reviews the evolution of image restoration and enhancement techniques from traditional filters to modern deep learning methods, including CNNs, Transformers, GANs, Normalizing Flows, and Diffusion Models, highlighting the challenges in real-world applications and introducing the RAIM NTIRE challenge.
01:09:47 — Mahmoud Affifi: Revisiting Image White Balancing
- This talk outlines a discussion on various aspects of image white balancing, covering in-camera auto white-balance correction, camera-independent illuminant estimation, post-capture correction, spatial white-balance, and the impact of errors on high-level computer vision tasks.
01:11:02 — Mahmoud Afifi: In-camera auto white-balance correction
- This talk discusses in-camera auto white-balance correction, covering color perception, camera ISP pipelines, illuminant estimation methods (statistical vs. learning), and challenges in cross-camera generalization and post-capture white-balance editing.
02:00:52 — Jian Wang: Towards A Better Camera in the App and AR Glass
- This talk introduces Snap Inc.’s work on improving camera technology for mobile apps and AR glasses, focusing on quality assessment, face restoration, scene restoration, segmentation, video enhancement, and personalized portrait enhancement using dual-pivot tuning.
02:22:05 — Jian Wang: Engagement Prediction for Short Videos
- Discusses metrics and features for predicting short video engagement, highlighting the development of Normalized Average Watch Percentage (NAWP) and Engagement Continuation Rate (ECR) as robust metrics, and showcasing the model’s performance and application in video recommendation.
02:30:35 — Jian Wang: Video Frame Interpolation
- Briefly touches upon the complexities of video frame interpolation, including non-linear motions and occlusions, and showcases a method for slowing down spotlight effects and correcting selfie image distortion.
02:35:25 — Mian Wei: Passive Ultra-Wideband Single-Photon Imaging
- Introduces passive ultra-wideband single-photon imaging, a technique for capturing events across 12 orders of magnitude in time, from seconds to picoseconds, by reconstructing continuous light intensity from discrete photon arrival times.
02:54:35 — Mian Wei: Preliminary Demonstrations
- Showcases the capabilities of passive ultra-wideband imaging as a ‘microscope for time,’ resolving various phenomena at different timescales, from fan motion to individual laser pulses, and enabling passive ultra-wideband videography and computational stroboscopy.
03:01:35 — Mian Wei: Future Directions: Ultra-Wideband Computer Vision and Imaging
- Outlines future applications including passively harvesting ambient photons for 3D reconstruction, real-time ultra-wideband imaging, and scientific imaging in astronomy and protein folding.

Key Takeaways

The MIPI workshop aims to advance mobile intelligent photography by addressing challenges in computational photography and imaging, particularly focusing on new sensors and imaging systems.
Current image restoration research has evolved from traditional filters to sophisticated deep learning models like Diffusion Models and GANs, which show promising results but still face significant challenges in handling complex real-world degradations.
The RAIM NTIRE challenge highlights the gap between academic research (often using simulated data and PSNR/SSIM metrics) and industry applications (requiring robust performance on real-world data and user-appreciated perceptual quality).
Future research in image restoration needs to focus on reducing model complexity for resource-limited devices, improving generalization to diverse degradations, accurately reconstructing fine-scale semantic structures, and effectively processing large, low-quality images.
Color perception in cameras is a complex process influenced by object reflectance, illuminant spectral power, and camera spectral sensitivity, which forms the basis for in-camera image signal processing (ISP).
Learning-based methods for illuminant estimation, while accurate, struggle with generalization across different camera models due to variations in sensor response functions, necessitating camera-independent approaches or domain adaptation techniques.
Post-capture white-balance correction and editing are crucial for user satisfaction, but are challenging because photo-finishing processes are camera-specific and aim for ‘pleasing’ rather than ‘standard’ colors.
Snap Inc. is actively researching and developing advanced computational photography techniques to enhance selfie camera quality, including personalized portrait enhancement, and addressing challenges like sRGB domain processing, complex degradations, and fast execution on low-end devices.
Passive ultra-wideband single-photon imaging allows simultaneous observation of phenomena across a vast range of timescales (12 orders of magnitude), overcoming limitations of conventional high-speed imaging.
The technique reconstructs continuous light intensity by analyzing discrete photon arrival times using flux probing theory and Fourier analysis, enabling “computational stroboscopy” to freeze motion at any timescale.
Preliminary demonstrations include resolving various light sources and motions from seconds to picoseconds, and reconstructing videos from extremely low photon counts, even enabling non-line-of-sight imaging.
Future applications include real-time ultra-wideband computer vision, scientific imaging (e.g., astronomy, protein folding), and efficient signal processing for massive photon stream data from emerging single-photon devices like Apple’s LiDAR scanner.

Methods / Models / Datasets Mentioned

3DLUT
ADANF
AIRNet
ASFFNet
Angular Error
AutoEnhancer
BSRGAN
BasicVSR
BasicVSR++
C5
CA-GAN
CCC model
CSRNet
Codeformer
ControlNet
DDNM
DDRM
DMDNet
DNF
DR2
DeblurGAN
Deep Neural Networks (DNN)
Deep WB
DeepDeblur
DiffBIR
DiffFace
DnCNN
Dual-Pivot Tuning
EDSR
ELAN
FFCC
FKP
Fourier basis functions
GFPGAN
GPEN
Gray-world assumption
HCFlow
Harold "Doc" Edgerton's electronic flash technique
KNN-WB
LLFlow
LiDAR scanner
LoRA
NTIRE Challenge
Noise2Void
NoiseFlow
OSEDiff
PASD
PD-GAN
PULSE
QFaR (MobiCom 2023)
RAIM
RIFE (ECCV22)
RVSRT
Raw-to-raw mapping
Rawformer
Real-ESRGAN
Refusion
Repaint
ResDiff
Restormer
SAM
SIEE
SPAD (Single-Photon Avalanche Diode)
SR3
SRCNN
SRDiff
SRFlow
SRGAN
STAR
STFAN
SeeSR
Self-Supervised Contrastive Learning
StableSR
SupIR
SwinIR
TSFlow
TTVSR
Uni-paint
VAE
VDSR
VRT
Velten et al. (2013) transient imaging
WaterFlow

Topics

Camera ISP · Computational Photography · Computational photography · Computational stroboscopy · Cross-camera generalization · Deep Learning for Imaging · Diffusion Models · Dual-pivot tuning · Face restoration · Flux probing theory · Generative Adversarial Networks (GANs) · High-speed videography · Illuminant estimation · Image Restoration and Enhancement · Image White Balancing · Image quality assessment · Mobile Intelligent Photography and Imaging (MIPI) · Personalized portrait enhancement · Photon counting · Post-capture editing · Real-World Image Degradations · Scene restoration · Segmentation · Signal processing · Single-photon imaging · Time-resolved imaging · Ultra-wideband imaging · Video enhancement · White-balance correction

Notes

Open for commentary — connections to other work, critiques, follow-up reading.