8th New Trends in Image Restoration and Enhancement Workshop and 8 Associated Challenges
Event: CVPR 2023, Vancouver, Canada / Hybrid · Duration: 617 min · ▶ Watch on YouTube
Abstract
The 8th New Trends in Image Restoration and Enhancement (NTIRE) workshop at CVPR 2023 features 14 associated challenges focusing on various image restoration and enhancement tasks. This segment includes the opening remarks by the host, Radu Timofte, detailing the workshop’s scope, statistics, sponsors, and invited speakers. It also features a presentation by Marcos Conde on the Lens-to-Lens Bokeh Effect Transformation Challenge, introducing a novel dataset and an efficient solution for controllable bokeh rendering and transformation. This segment covers multiple presentations from the NTIRE 2023 Workshop at CVPR 2023. It begins with a summary of the Bokeh Effect Transformation challenge, followed by a detailed presentation on ‘Selective Bokeh Effect Transformation’. The next talk focuses on ‘High Perceptual Quality JPEG Decoding via Posterior Sampling’. The segment then transitions to the ‘NTIRE 2023 Challenge on Light Field Image Super-Resolution’, providing an overview of the dataset, methods, and results. Two specific methods from this challenge are then presented: ‘Spatial-Angular Multi-Scale Mechanism for Light Field Spatial Super-Resolution’ and ‘DistgEPIT: Enhanced Disparity Learning for Light Field Image Super-Resolution’. The presentations highlight novel datasets, architectural trends (like Transformers), and strategies for improving image restoration and super-resolution tasks. This segment features multiple presentations from the NTIRE 2023 Workshop, covering advancements in image restoration and enhancement. Topics include disentangling light fields for super-resolution and disparity estimation, the NTIRE 2023 HR NonHomogeneous Dehazing Challenge, and novel dehazing methods leveraging Fast Fourier Convolution, ConvNeXt, and Vision Transformers. The presentations highlight challenges in handling non-homogeneous haze, limited datasets, and the importance of data-centric approaches and efficient network architectures. This presentation details the NTIRE 2023 Challenge on Image Super-Resolution (x4), which attracted 33 registered participants and 15 valid entries. The challenge aimed to generate high-resolution images from low-resolution inputs by leveraging prior information. The presentation provides an overview of the challenge methodology, including phases, datasets, and evaluation metrics (PSNR, SSIM). It then highlights the results, showing that the top 5 teams achieved PSNR values above 31 dB, with the top 3 utilizing transformer-based models. The specific methods employed by the top five performing teams are discussed, emphasizing their data augmentation strategies (e.g., CutBlur, Mixup), model architectures (e.g., LIIF-LDN, RDN-LTE, SwinIR, Transformer with Cross-Scale Attention, Wavelet Hallucination, Multi-stage progressive fusion), and ensemble techniques. Key findings underscore the effectiveness of data augmentation and ensemble methods in achieving high performance, while also pointing towards future research directions in advanced transformer models and novel augmentation/ensemble strategies. This segment of the CVPR 2025 presentation stream displays a placeholder screen for a Nikon Webcam Utility, indicating a technical issue or a break in the live presentation. No actual content or talks are presented during this period. The presentation delves into the complexities of implementing High Dynamic Range (HDR) in Mixed Reality (MR) devices, particularly through the lens of denoising. It highlights the critical challenges faced in MR systems, including stringent latency requirements (photon to display), high noise levels due to low light and short exposure times, the need for real-time video processing at high frame rates (60+ FPS) with temporal stability, and severe power efficiency constraints (under 100mW) imposed by battery and thermal limitations of head-mounted devices. The speaker proposes a holistic co-design approach for the entire image processing pipeline, from sensor readout to display, to mitigate these conflicting challenges. Key strategies discussed include sensor-specific noise modeling, careful model design with efficient backbones and tiled architectures to minimize memory transfers, and gaze-based foveated processing to optimize resource allocation by applying high-quality denoising only to the foveal region while ensuring temporal stability in the periphery. This segment features two distinct presentations. The first continues an exploration of Neural Radiance Fields (NeRF), detailing Mip-NeRF 360 and Zip-NeRF. It covers anti-aliasing techniques using integrated positional encoding, parameterization for unbounded scenes, and cone-based sampling strategies, demonstrating their effectiveness in reducing rendering artifacts. The second presentation reports on the NTIRE 2023 Image Shadow Removal Challenge. It discusses the motivation for shadow removal research, introduces the WSRD dataset designed for complex shadow interactions, and highlights the winning solution by Team MTCV, which employs a Pyramid Ensemble Structure (PES) based on NAFNET.
Speakers
- Radu Timofte — NTIRE23 host
- Marcos Conde — University of Würzburg
- Juewen Peng — Huazhong University of Science and Technology
- Sean Man — Technion - Israel Institute of Technology
- Yingqian Wang — ByteDance
- Chen Gao — Beijing Jiaotong University
- Kai Jin — Bigo Technology Pte. Ltd.
- Florin-Alexandru Vasluianu — University of Würzburg
- Han Zhou — McMaster University
- Yangyi Liu — McMaster University
- Bilel Benjdira — Prince Sultan University
- Bilal Reysara
- Yuanzhi Zhu
- Aakash Rajpal
- Mehran Jeelani
- Fahad Khan — MBZUAI, Linköping University
- Yixuan Gao — Shanghai Jiao Tong University
- Unnamed Speaker — Unknown (wearing V7 badge)
- Longguang Wang — National University of Defense Technology
- Ming Cheng — ByteDance
- Yulan Guo — ByteDance
- Juncheng Li — ByteDance
- Shuhang Gu — ByteDance
- Jonathan T. Barron — Google
- Florin Vasluianu — Computer Vision Lab, IFI & CAIDAS, University of Würzburg
- Yanyu Mao — Xian University of Posts and Telecommunications
- Egor Ershov — Institute for Information Transmission Problems named after A.A. Kharkevich
- Simone Zini — University of Milano - Bicocca
- Ming-Hsuan Yang — UC Merced / Google
- Hsin-Ying Lee — Google
Talks (111)
- 00:05:50 — Radu Timofte: NTIRE 2023 Workshop Opening Remarks
- Radu Timofte opens the 8th NTIRE workshop, discusses the hybrid setup, presents workshop statistics, thanks sponsors, introduces invited speakers, and advertises future workshops and open positions.
- 00:26:27 — Marcos Conde: Lens-to-Lens Bokeh Effect Transformation. NTIRE 2023 Challenge Report
- Marcos Conde presents the NTIRE 2023 Bokeh Effect Transformation Challenge, discussing the motivation behind bokeh, the novel dataset (BETD) created with DSLR cameras, and their proposed EBokehNet approach for controllable bokeh transformation.
- 00:38:34 — Radu Timofte: Conclusions and Challenge Papers for NTIRE 2023 Bokeh Effect Transformation Challenge
- This segment summarizes the conclusions of the Bokeh Effect Transformation challenge, highlighting the novel dataset, the challenge’s role in gauging state-of-the-art, and efficient solutions, along with a list of contributing challenge papers.
- 00:42:15 — Juewen Peng: Selective Bokeh Effect Transformation
- This talk introduces a novel method for controllable Bokeh effect transformation using a new concept of Blur Ratio and a framework with feature selection and integration strategy.
- 00:51:34 — Sean Man: High Perceptual Quality JPEG Decoding via Posterior Sampling
- This talk presents a method for high perceptual quality JPEG decoding by sampling solutions stochastically, addressing the perception-distortion tradeoff and achieving consistency with measurements.
- 00:57:20 — Yingqian Wang: NTIRE 2023 Challenge on Light Field Image Super-Resolution: Dataset, Methods and Results
- This presentation introduces the NTIRE 2023 Challenge on Light Field Image Super-Resolution, detailing the dataset, common methods, and results, highlighting the importance of Transformers and spatial-EPI subspaces.
- 01:04:05 — Chen Gao: Spatial-Angular Multi-Scale Mechanism for Light Field Spatial Super-Resolution
- This talk introduces a novel network structure for Light Field Spatial Super-Resolution that uses a Multi-Dimension Interaction Block (MDIB) and a Multi-Scale Process Block (MSPB) to effectively exploit spatial and angular information.
- 01:14:50 — Kai Jin: DistgEPIT: Enhanced Disparity Learning for Light Field Image Super-Resolution
- This talk introduces DistgEPIT, a method for Light Field Image Super-Resolution that enhances disparity learning by disentangling spatial and angular information and using a multi-scale process block.
- 01:17:08 — Yingqian Wang: DistgEPIT: Disentangling light fields for super-resolution and disparity estimation
- Introduces a network architecture called DistgEPIT that combines CNN-based and Transformer-based methods for light field super-resolution, aiming to leverage the strengths of both for spatial feature extraction, angular correlation, and long-range disparity modeling.
- 01:29:17 — Florin-Alexandru Vasluianu: NTIRE 2023 HR NonHomogeneous Dehazing Challenge
- Presents the NTIRE 2023 HR NonHomogeneous Dehazing Challenge, detailing the dataset acquisition, challenge phases, and highlighting the winning solutions (Team DWT-FCC GAN and Team ITB Dehaze) which utilize two-branched architectures for feature extraction and image refinement.
- 01:37:37 — Han Zhou: Breaking Through the Haze: An Advanced Non-Homogeneous Dehazing Method based on Fast Fourier Convolution and ConvNeXt
- Addresses the challenges of non-uniform haze distribution and limited datasets in non-homogeneous dehazing by proposing a two-branch network that combines DWT and FFC for feature extraction with a prior knowledge branch initialized with a pre-trained ConvNeXt model.
- 01:42:24 — Yangyi Liu: A Data-Centric Solution to NonHomogeneous Dehazing via Vision Transformer
- Proposes a data-centric approach for non-homogeneous dehazing, addressing color discrepancies in combined datasets from previous years through RGB channel-wise gamma correction and utilizing a two-branch Vision Transformer framework (Swin Transformer with pre-trained weights and a data fitting branch).
- 01:50:41 — Bilel Benjdira: Streamlined Global and Local Features Combinator (SGLC) for High Resolution Image Dehazing
- Introduces the Streamlined Global and Local Features Combinator (SGLC) for high-resolution image dehazing, designed to remove haze without resizing input images or dividing them into patches, by combining global and local features.
- 01:55:42 — Bilal Reysara: SGLC Architecture for Dehazing High-Resolution Images
- A method called SGLC Architecture is proposed for dehazing high-resolution images without resizing or dividing into patches, using a Global Features Generator (GFG) and a Local Features Enhancer (LFE) with a customized loss function.
- 01:57:11 — Yuanzhi Zhu: Denoising Diffusion Models for Plug-and-Play Image Restoration
- This work introduces a plug-and-play image restoration framework that leverages denoising diffusion probabilistic models (DDPMs) as a generative prior, addressing limitations of previous iterative approaches.
- 02:00:31 — Aakash Rajpal: High-Resolution Synthetic RGB-D Datasets for Monocular Depth Estimation
- This paper introduces a high-resolution synthetic RGB-D dataset (HRSD) generated from GTA-V to improve monocular depth estimation, addressing limitations of existing datasets.
- 02:04:41 — Mehran Jeelani: Expanding Synthetic Real-World Degradations for Blind Video Super Resolution
- This work proposes a method to create a diverse pool of synthetic real-world degradations (SRWD) for blind video super-resolution, including a new KLens VSR dataset.
- 02:06:02 — Fahad Khan: Burst Image Restoration and Enhancement
- This paper introduces BIPNet, a burst image processing approach that implicitly aligns burst features, enables inter-frame communication via a pseudo-burst mechanism, and uses adaptive group upsampling for progressive spatial resolution increase.
- 02:34:16 — Radu Timofte: NTIRE 2023 Challenge on Image Super-Resolution (x4): Methods and Results
- This presentation provides an overview of the NTIRE 2023 Image Super-Resolution Challenge, detailing its methodology, participant results, and the innovative methods employed by the top-performing teams.
- 03:12:50 — Yixuan Gao: Methods and Teams: 2nd Place - Graphene (Super-Resolution)
- Details the Graphene team’s 2nd place solution for super-resolution, featuring a Transformer with Cross-Scale Attention (CSA) and Wavelet Hallucination (WH) using Haar wavelets.
- 03:13:50 — Yixuan Gao: Methods and Teams: 3rd Place - IPLAB (Super-Resolution)
- Presents the IPLAB team’s 3rd place solution, the Attention Retractable Frequency Transformer (ARFT), utilizing D-MSA and S-MSA attention strategies and a FEB network for long-range context in the frequency domain, alongside a progressive training strategy.
- 03:15:30 — Yixuan Gao: Conclusion of Super-Resolution Challenge Overview
- Summarizes key trends in super-resolution, emphasizing the dominance of Transformer architectures, the importance of global information, and the role of extensive training data and augmentation.
- 03:41:49 — Radu Timofte: NTIRE 2023 HR NonHomogeneous Dehazing Challenge - Winner Award
- Winner Award for the NTIRE 2023 HR NonHomogeneous Dehazing Challenge presented to Han Zhou, Wei Dong, Yangyi Liu, and Jun Chen.
- 03:41:53 — Radu Timofte: NTIRE 2023 Challenge on Lens-to-Lens Bokeh Effect Transformation - Winner Award
- Winner Award for the NTIRE 2023 Challenge on Lens-to-Lens Bokeh Effect Transformation presented to Yangyi Liu, Huan Liu, Liangyan Li, Zijun Wu, and Jun Chen.
- 03:42:06 — Radu Timofte: NTIRE 2023 Challenge on 360° Omnidirectional Image Super-Resolution - Winner Award
- Winner Award for the NTIRE 2023 Challenge on 360° Omnidirectional Image Super-Resolution presented to Juewen Peng et al.
- 03:42:16 — Radu Timofte: NTIRE 2023 Challenge on 360° Omnidirectional Video Super-Resolution - Winner Award
- Winner Award for the NTIRE 2023 Challenge on 360° Omnidirectional Video Super-Resolution presented to Xiaopeng Sun et al.
- 03:42:24 — Radu Timofte: NTIRE 2023 Challenge on 360° Omnidirectional Video Super-Resolution - Runner-Up Award (Wanwan Cui et al.)
- Runner-Up Award for the NTIRE 2023 Challenge on 360° Omnidirectional Video Super-Resolution presented to Wanwan Cui, Tianyu Xu, Chunyang Li, Long Bao, and Heng Sun.
- 03:42:33 — Radu Timofte: NTIRE 2023 Challenge on 360° Omnidirectional Video Super-Resolution - Runner-Up Award (Renlong Wu et al.)
- Runner-Up Award for the NTIRE 2023 Challenge on 360° Omnidirectional Video Super-Resolution presented to Renlong Wu et al.
- 03:42:43 — Radu Timofte: NTIRE 2023 Challenge on Quality Assessment of Video Enhancement - Winner Award
- Winner Award for the NTIRE 2023 Challenge on Quality Assessment of Video Enhancement presented to Yilin Li et al.
- 03:42:52 — Radu Timofte: NTIRE 2023 Challenge on Quality Assessment of Video Enhancement - 3rd Place Award
- 3rd Place Award for the NTIRE 2023 Challenge on Quality Assessment of Video Enhancement presented to Heng Cong et al.
- 03:42:57 — Radu Timofte: NTIRE 2023 Challenge on Video Colorization - Winner Award (Yixin Yang et al.)
- Winner Award for the NTIRE 2023 Challenge on Video Colorization presented to Yixin Yang et al.
- 03:43:06 — Radu Timofte: NTIRE 2023 Challenge on Video Colorization - Winner Award (Shuai Liu et al.)
- Winner Award for the NTIRE 2023 Challenge on Video Colorization presented to Shuai Liu et al.
- 03:43:13 — Radu Timofte: NTIRE 2023 Challenge on Light Field Image Super-Resolution - Winner Award
- Winner Award for the NTIRE 2023 Challenge on Light Field Image Super-Resolution presented to Kai Jin et al.
- 03:43:20 — Radu Timofte: NTIRE 2023 Challenge on Light Field Image Super-Resolution - Honorable Award
- Honorable Award for the NTIRE 2023 Challenge on Light Field Image Super-Resolution presented to Yutong Liu et al.
- 03:43:26 — Radu Timofte: NTIRE 2023 Challenge on Image Denoising - Winner Award
- Winner Award for the NTIRE 2023 Challenge on Image Denoising presented to Zhijun Tu et al.
- 03:43:33 — Radu Timofte: NTIRE 2023 Challenge on Image Denoising - Runner-Up Award
- Runner-Up Award for the NTIRE 2023 Challenge on Image Denoising presented to Xiangyu Kong et al.
- 03:43:39 — Radu Timofte: NTIRE 2023 Challenge on Image Denoising - Honorable Award
- Honorable Award for the NTIRE 2023 Challenge on Image Denoising presented to Shuai Liu et al.
- 03:43:45 — Radu Timofte: NTIRE 2023 Challenge on Stereo Image Super-Resolution: Track 1 - Winner Award
- Winner Award for NTIRE 2023 Challenge on Stereo Image Super-Resolution, Track 1, presented to Ming Cheng et al.
- 03:43:54 — Radu Timofte: NTIRE 2023 Challenge on Stereo Image Super-Resolution: Track 2 - Winner Award
- Winner Award for NTIRE 2023 Challenge on Stereo Image Super-Resolution, Track 2, presented to Dafeng Zhang et al.
- 03:43:58 — Radu Timofte: NTIRE 2023 Challenge on Stereo Image Super-Resolution: Track 3 - Winner Award
- Winner Award for NTIRE 2023 Challenge on Stereo Image Super-Resolution, Track 3, presented to Kexin Zhang et al.
- 03:44:04 — Radu Timofte: NTIRE 2023 Challenge on Efficient Super-Resolution - Winner Award
- Winner Award for the NTIRE 2023 Challenge on Efficient Super-Resolution presented to Lei Yu et al.
- 03:44:11 — Radu Timofte: NTIRE 2023 Challenge on Efficient Super-Resolution - 3rd Place Award
- 3rd Place Award for the NTIRE 2023 Challenge on Efficient Super-Resolution presented to Mingxi Li et al.
- 03:44:16 — Radu Timofte: NTIRE 2023 Challenge on Night Photography Rendering: People’s Choice - Winner Award
- Winner Award for the NTIRE 2023 Challenge on Night Photography Rendering: People’s Choice, presented to Simone Zini et al.
- 03:44:24 — Radu Timofte: NTIRE 2023 Challenge on Night Photography Rendering: Professional Choice - Winner Award
- Winner Award for the NTIRE 2023 Challenge on Night Photography Rendering: Professional Choice, presented to Shuai Liu et al.
- 03:44:29 — Radu Timofte: NTIRE 2023 Challenge on Efficient Deep Models for Real-Time 4K Image Super-Resolution: Track 1 - 1080p to 4K - Winner Award (Jiaming Guo et al.)
- Winner Award for NTIRE 2023 Challenge on Efficient Deep Models for Real-Time 4K Image Super-Resolution, Track 1 (1080p to 4K), presented to Jiaming Guo et al.
- 03:44:43 — Radu Timofte: NTIRE 2023 Challenge on Efficient Deep Models for Real-Time 4K Image Super-Resolution: Track 1 - 1080p to 4K - 2nd Place Award
- 2nd Place Award for NTIRE 2023 Challenge on Efficient Deep Models for Real-Time 4K Image Super-Resolution, Track 1 (1080p to 4K), presented to Cen Liu et al.
- 03:44:50 — Radu Timofte: NTIRE 2023 Challenge on Efficient Deep Models for Real-Time 4K Image Super-Resolution: Track 1 - 1080p to 4K - 3rd Place Award
- 3rd Place Award for NTIRE 2023 Challenge on Efficient Deep Models for Real-Time 4K Image Super-Resolution, Track 1 (1080p to 4K), presented to Yuanfan Zhang et al.
- 03:44:56 — Radu Timofte: NTIRE 2023 Challenge on Efficient Deep Models for Real-Time 4K Image Super-Resolution: Track 2 - 720p to 4K - Winner Award
- Winner Award for NTIRE 2023 Challenge on Efficient Deep Models for Real-Time 4K Image Super-Resolution, Track 2 (720p to 4K), presented to Mustafa Ayazoglu et al.
- 03:45:05 — Radu Timofte: NTIRE 2023 Challenge on Efficient Deep Models for Real-Time 4K Image Super-Resolution: Track 2 - 720p to 4K - 2nd Place Award
- 2nd Place Award for NTIRE 2023 Challenge on Efficient Deep Models for Real-Time 4K Image Super-Resolution, Track 2 (720p to 4K), presented to Lingshun Kong et al.
- 03:45:12 — Radu Timofte: NTIRE 2023 Challenge on Efficient Deep Models for Real-Time 4K Image Super-Resolution: Track 2 - 720p to 4K - 3rd Place Award
- 3rd Place Award for NTIRE 2023 Challenge on Efficient Deep Models for Real-Time 4K Image Super-Resolution, Track 2 (720p to 4K), presented to Cen Liu et al.
- 04:05:17 — Unnamed Speaker: NTIRE-HRDepthChallenge Overview and Conclusions
- A presentation summarizing the NTIRE-HRDepthChallenge, discussing overall considerations, conclusions, and a video demonstration of results.
- 05:09:05 — Longguang Wang: NTIRE 2023 Challenge on Stereo Image Super-Resolution: Methods and Results
- Presentation of the NTIRE 2023 Challenge on Stereo Image Super-Resolution, including its description, tracks, results, and key observations.
- 05:09:42 — Ming Cheng: Hybrid Transformer and CNN Attention Network for Stereo Image Super-resolution
- Presentation of the BSR team’s winning solution for the NTIRE 2023 Stereo Image Super-Resolution Challenge, detailing their hybrid Transformer and CNN attention network.
- 05:10:29 — Yulan Guo: Hybrid Transformer and CNN Attention Network for Stereo Image Super-resolution
- Presentation of the BSR team’s winning solution for the NTIRE 2023 Stereo Image Super-Resolution Challenge, detailing their hybrid Transformer and CNN attention network.
- 05:11:16 — Yingqian Wang: Hybrid Transformer and CNN Attention Network for Stereo Image Super-resolution
- Presentation of the BSR team’s winning solution for the NTIRE 2023 Stereo Image Super-Resolution Challenge, detailing their hybrid Transformer and CNN attention network.
- 05:12:03 — Juncheng Li: Hybrid Transformer and CNN Attention Network for Stereo Image Super-resolution
- Presentation of the BSR team’s winning solution for the NTIRE 2023 Stereo Image Super-Resolution Challenge, detailing their hybrid Transformer and CNN attention network.
- 05:12:50 — Shuhang Gu: Hybrid Transformer and CNN Attention Network for Stereo Image Super-resolution
- Presentation of the BSR team’s winning solution for the NTIRE 2023 Stereo Image Super-Resolution Challenge, detailing their hybrid Transformer and CNN attention network.
- 05:13:37 — Radu Timofte: Hybrid Transformer and CNN Attention Network for Stereo Image Super-resolution
- Presentation of the BSR team’s winning solution for the NTIRE 2023 Stereo Image Super-Resolution Challenge, detailing their hybrid Transformer and CNN attention network.
- 06:25:40 — Radu Timofte: HDR through denoising
- This segment discusses the challenges and solutions for achieving High Dynamic Range (HDR) in Mixed Reality (MR) devices through denoising, focusing on latency, noise, power efficiency, and co-design of the entire image processing pipeline.
- 07:42:48 — Jonathan T. Barron: Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields & Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields
- This talk explains Mip-NeRF and Zip-NeRF, detailing anti-aliasing techniques, parameterization for unbounded scenes, and cone-based sampling strategies, with visual results and comparisons to baseline methods.
- 08:14:53 — Florin Vasluianu: NTIRE 2023 Image Shadow Removal Challenge Report
- This report details the NTIRE 2023 Image Shadow Removal Challenge, covering its motivation, the WSRD dataset, and the winning solution by Team MTCV, which uses a Pyramid Ensemble Structure (PES) based on NAFNET.
- 08:59:56 — Yanyu Mao: Multi-level Dispersion Residual Network for Efficient Image Super-Resolution
- This talk presents MDRN, a multi-level dispersion residual network for efficient image super-resolution, which won first place in the NTIRE 2023 Efficient Super-Resolution SubTrack 1.
- 09:07:37 — Egor Ershov: Night photography rendering challenge
- This talk introduces the Night Photography Rendering Challenge, highlighting the difficulties of night photography and the subjective nature of evaluating rendered images.
- 09:20:11 — Simone Zini: Back to the future: a night photography rendering ISP without deep learning
- This talk presents a traditional, non-deep learning image signal processing (ISP) pipeline for night photography rendering, emphasizing its interpretability and efficiency.
- 09:31:03 — Ming-Hsuan Yang: Learning to Synthesize Image and Video Contents
- This talk provides an overview of recent work on synthesizing image and video content, focusing on generative adversarial networks (GANs) and addressing challenges with limited data.
- 09:38:30 — Hsin-Ying Lee: Q&A: Image and Video Generation Models
- Discussion on the capabilities and limitations of Muse, InfinityGAN, and MAGVIT models, including performance comparisons, scalability, and future directions.
Key Takeaways
- The NTIRE 2023 workshop is the 8th edition, featuring 14 associated challenges and 77 accepted papers.
- Reproducibility is a key focus, with winners’ solutions checked and top teams submitting sources and factsheets.
- The Bokeh Effect Transformation Challenge introduces a novel dataset (BETD) using two DSLR cameras and multiple lens setups to study controllable bokeh rendering.
- The proposed EBokehNet model offers an efficient state-of-the-art solution for controllable bokeh transformation and rendering, incorporating lens information and positional encoding.
- The NTIRE 2023 Bokeh Effect Transformation Challenge introduced a novel Blur Ratio concept and an efficient framework for controllable bokeh effect transformation, demonstrating strong performance.
- High perceptual quality JPEG decoding can be achieved by leveraging posterior sampling, which offers a theoretically sound approach to address the perception-distortion tradeoff.
- The NTIRE 2023 Light Field Image Super-Resolution Challenge highlighted the increasing popularity and effectiveness of Transformer architectures, the critical role of spatial and EPI subspaces, and the benefits of data/model ensembles.
- The Spatial-Angular Multi-Scale Mechanism effectively addresses LFSR challenges by disentangling spatial and angular information and using multi-scale processing, enhanced by a shear ensemble approach.
- DistgEPIT improves LFSR performance by enhancing disparity learning through disentangling spatial and angular information and employing multi-scale processing and a shear ensemble approach.
- Combining CNN and Transformer strengths in architectures like DistgEPIT can effectively model both local spatial features/angular correlation and long-range disparities for light field super-resolution.
- Data-centric engineering, including techniques like RGB channel-wise gamma correction, is crucial for harmonizing diverse datasets and improving model performance in dehazing tasks.
- Two-branched network architectures, often leveraging frequency domain processing (DWT, FFC) or Vision Transformers (Swin Transformer), are effective for non-homogeneous dehazing, especially when combined with prior knowledge transfer from pre-trained models (e.g., ConvNeXt).
- The Position-Sensitive Windowing (PSW) operation is vital for maintaining original disparity relationships during inference in light field processing.
- For high-resolution dehazing, methods that avoid resizing or patching input images, like SGLC, offer efficient solutions by directly combining global and local features.
- The NTIRE 2023 SR challenge demonstrated significant advancements in image super-resolution.
- Transformer-based architectures are dominant among top-performing solutions.
- Extensive data augmentation and sophisticated ensemble strategies are crucial for achieving state-of-the-art results.
- High PSNR values (above 31 dB for top 5) were achieved, demonstrating robust performance.
- Future research should focus on more advanced transformer models, novel data augmentation, and effective ensemble techniques.
- T
- h
- e
- v
- i
- d
- e
- o
- s
- e
- g
- m
- e
- n
- t
- p
- r
- i
- m
- a
- r
- i
- l
- y
- s
- h
- o
- w
- s
- a
- t
- e
- c
- h
- n
- i
- c
- a
- l
- p
- l
- a
- c
- e
- h
- o
- l
- d
- e
- r
- s
- c
- r
- e
- e
- n
- ,
- s
- u
- g
- g
- e
- s
- t
- i
- n
- g
- a
- p
- a
- u
- s
- e
- o
- r
- i
- s
- s
- u
- e
- w
- i
- t
- h
- t
- h
- e
- l
- i
- v
- e
- c
- a
- m
- e
- r
- a
- f
- e
- e
- d
- .
- N
- o
- a
- c
- a
- d
- e
- m
- i
- c
- c
- o
- n
- t
- e
- n
- t
- i
- s
- p
- r
- e
- s
- e
- n
- t
- e
- d
- .
- Achieving HDR in MR devices is a multi-faceted challenge requiring a holistic approach.
- Conflicting constraints (latency, noise, power, processing speed) necessitate careful co-design across the entire hardware-software pipeline.
- Denoising plays a crucial role in extending dynamic range, especially in low-light, low-exposure scenarios common in MR.
- Optimizing for memory transfer and thermal management is critical for power-efficient, head-mounted devices.
- Gaze-based foveated processing offers a promising direction for efficient, high-quality image processing by selectively applying intensive denoising to the fovea.
- Temporal stability is paramount for video processing in MR to avoid user discomfort.
- Advanced NeRF models like Mip-NeRF and Zip-NeRF significantly improve rendering quality by effectively combating aliasing artifacts, particularly in complex and unbounded environments, utilizing integrated positional encoding and sophisticated cone-based sampling strategies.
- The NTIRE 2023 Image Shadow Removal Challenge provided a valuable benchmark for advancing shadow removal techniques, with the WSRD dataset offering a more complex and realistic testbed compared to previous datasets.
- Winning solutions in shadow removal, such as Team MTCV’s PES model, demonstrate the effectiveness of multi-scale and ensemble-based deep learning architectures in tackling intricate shadow interactions.
Methods / Models / Datasets Mentioned
AI-DenoisingAODAlphaNetBGNetBITP12BM3DBNU-AI-TRYBasicLFSRBicubicBilinear UpsamplingBlendBokehOrNotCBNU-MIP-LabCBTNetCNNChannel AttentionChannel attention moduleConvNeXtCross-attention moduleCutBlurCutMixCutMixupDCPDMLabDPTDW-GANDWT UNetDWT-FCC GANDenoisingDepth-wise convolutionDistgEPITDistgSSRDoubleGanDynamic Residual BlockEBokehNetEDSREPI Conv2DEPITFBCNNFFAFFC (Fast Fourier Convolution)FeaNetGANGCANetGMM (Gaussian Mixture Models)Haar waveletHawkeyeGroupIIR-LabINSISIR-SDEITB DehazeIntegrated Positional EncodingLF-ATOLF-DFNetLF-InterNetLF-UIINetLFSR-gdut-teamLFSSRLIIF-EDSRLIIF-LDNMDI-GroupMDIB (Multi-Dimension Interaction Block)MEG-NetMMSEMSPB (Multi-Scale Process Block)Mip-NeRFMixupMulti-scale feature extraction and fusionMulti-stage progressive fusionNAFBETNAFNETNafBlockNeRFOpenMeowPSW (Position-Sensitive Windowing)Pixel ShufflePixel-shuffle layerPositional EncodingPyramid Ensemble Structure (PES)QGACQGAC-GANRCAB (Residual Channel Attention Group)RCNRDN-LTERGB Channel-wise Gamma CorrectionRGB permuteRefineNetRes2NetResNet-like blocksSBTNetSGLC (Streamlined Global and Local Features Combinator)SGLMSSHU-IVIPLabShear operationSpatial Conv2DSwin Transformer V2SwinIRSwinIR-LTETone mappingTransformerTransformer with Cross-Scale Attention (CSA)VDSRVIDARVariational analysisVision TransformerWavelet Hallucination (WH)Weighted FusionZip-NeRFresLF
Topics
4D Light Field · Associated Challenges · Blur Ratio · Bokeh Effect Transformation · CNN-based methods · Challenge methodology · Challenge results · Co-design of image processing pipeline · Computational Photography · Data Augmentation · Data-Centric AI · Deep Learning Architectures · Deep Learning for Image Processing · Deep Learning for SR · Denoising · Disparity Estimation · Disparity Learning · Efficient model architectures (backbones, tiled models) · Ensemble methods · Feature Selection · Frequency Domain Processing · Gaze-based foveated processing · HDR (High Dynamic Range) · High Noise (low light, low exposure) · High-Resolution Imaging · Image Enhancement · Image Restoration · Image Super-Resolution (SR) · Integration Strategy · JPEG Decoding · Latency (Photon to Display) · Light Field Image Super-Resolution (LFSR) · Light Field Super-Resolution · Mip-NeRF · Mixed Reality (MR) devices · Multi-Scale Processing · NAFNET · NTIRE 2023 Challenge · NTIRE Workshop · Neural Radiance Fields (NeRF) · Non-Homogeneous Dehazing · PSNR · Perception-Distortion Tradeoff · Perceptual Quality · Posterior Sampling · Power Efficiency (battery, thermal limitations) · Pyramid Ensemble Structure (PES) · Reproducibility · SSIM · Sensor-specific noise model · Spatial-Angular Information · Temporal artifacts · Transformer-based methods · Transformer-based models · Video Processing (60+ FPS, temporal stability) · WSRD dataset · XY aliasing · Z aliasing · Zip-NeRF · anti-aliasing · color misalignment · cone-based sampling · diffuse light · directional light · grid-based NeRF · image shadow removal · integrated positional encoding · parameterization · positional encoding · real-time rendering · semantic misalignment · unbounded scenes · user study · winning solution
Notes
Open for commentary — connections to other work, critiques, follow-up reading.