Abstract:Obtaining pairs of low/normal-light videos, with motions, is more challenging than still images, which raises technical issues and poses the technical route of unpaired learning as a critical role. This paper makes endeavors in the direction of learning for low-light video enhancement without using paired ground truth. Compared to low-light image enhancement, enhancing low-light videos is more difficult due to the intertwined effects of noise, exposure, and contrast in the spatial domain, jointly with the need for temporal coherence. To address the above challenge, we propose the Unrolled Decomposed Unpaired Network (UDU-Net) for enhancing low-light videos by unrolling the optimization functions into a deep network to decompose the signal into spatial and temporal-related factors, which are updated iteratively. Firstly, we formulate low-light video enhancement as a Maximum A Posteriori estimation (MAP) problem with carefully designed spatial and temporal visual regularization. Then, via unrolling the problem, the optimization of the spatial and temporal constraints can be decomposed into different steps and updated in a stage-wise manner. From the spatial perspective, the designed Intra subnet leverages unpair prior information from expert photography retouched skills to adjust the statistical distribution. Additionally, we introduce a novel mechanism that integrates human perception feedback to guide network optimization, suppressing over/under-exposure conditions. Meanwhile, to address the issue from the temporal perspective, the designed Inter subnet fully exploits temporal cues in progressive optimization, which helps achieve improved temporal consistency in enhancement results. Consequently, the proposed method achieves superior performance to state-of-the-art methods in video illumination, noise suppression, and temporal consistency across outdoor and indoor scenes.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to effectively enhance the quality of low - light videos in the absence of paired training data. Specifically, the paper mainly focuses on the following aspects: 1. **Spatio - temporal consistency problem**: Compared with static images, when dealing with low - light videos, the influences of the spatial domain (degradation within each frame of the image) and the temporal domain (inter - frame consistency) need to be considered simultaneously. Due to the motion between adjacent frames in the video, directly applying the image enhancement method to each frame will lead to inconsistent situations in the enhanced video frames. 2. **Lack of paired data**: In practical applications, it is very difficult to obtain paired video data under low - light and normal - light conditions. Therefore, how to use unpaired data for learning has become an important challenge. 3. **Over - exposure or under - exposure problem**: Existing methods are prone to over - exposure or under - exposure without pixel - level supervision and human - perception feedback, thus affecting the visual quality of the enhancement results. To solve the above problems, the paper proposes a method named Unrolled Decomposed Unpaired Network (UDU - Net). This method realizes the controllable enhancement of low - light videos by unrolling the optimization function into a deep network, decomposing the signal into spatially and temporally related factors, and iteratively updating these factors. Specific technical means include: - **Maximum a posteriori (MAP) estimation**: Formulate the low - light video enhancement problem as a MAP estimation problem, and design visual regularization terms in space and time. - **Intra - sub - network**: Use unpaired high - quality image data and human - perception feedback to adjust the statistical distribution and suppress over - exposure or under - exposure. - **Inter - sub - network**: Make full use of temporal cues for progressive optimization to achieve better temporal consistency. - **Human - perception feedback mechanism**: Introduce a new mechanism to guide network optimization by integrating human - perception feedback, ensuring that the enhancement results conform to human visual habits. Through these technical means, UDU - Net can effectively increase the brightness of low - light videos, suppress noise and maintain temporal consistency without paired training data, thus achieving better performance than existing methods in both indoor and outdoor scenes.

Unrolled Decomposed Unpaired Learning for Controllable Low-Light Video Enhancement

Low Light Enhancement by Unsupervised Network.

STARNet: Low-light Video Enhancement Using Spatio-Temporal Consistency Aggregation

Temporally Consistent Enhancement of Low-Light Videos via Spatial-Temporal Compatible Learning

A Two-Stage Unsupervised Approach for Low Light Image Enhancement

LVE-S2D: Low-Light Video Enhancement From Static to Dynamic

Low-Light Video Enhancement via Spatial-Temporal Consistent Illumination and Reflection Decomposition

Self-Supervision via Controlled Transformation and Unpaired Self-Conditioning for Low-Light Image Enhancement

Mutual Support and Promotion: Learning Structure Compensation and Context Completion for Low-Light Vision

Dual Degradation-Inspired Deep Unfolding Network for Low-Light Image Enhancement

Dark2Light: multi-stage progressive learning model for low-light image enhancement

Unsupervised network for low-light enhancement

Real-time Attentive Dilated U-Net for Extremely Dark Image Enhancement

Low-Light Image Enhancement via a Deep Hybrid Network

Illumination-adaptive Unpaired Low-light Enhancement

Empowering Low-Light Image Enhancer through Customized Learnable Priors

Learning an Adaptive Model for Extreme Low-light Raw Image Processing

DiffLLE: Diffusion-guided Domain Calibration for Unsupervised Low-light Image Enhancement

Decoupled Low-Light Image Enhancement

Exploring Fast and Flexible Zero‐Shot Low‐Light Image/Video Enhancement

Unsupervised low-light image enhancement by data augmentation and contrastive learning