Unrolled Decomposed Unpaired Learning for Controllable Low-Light Video Enhancement

Lingyu Zhu,Wenhan Yang,Baoliang Chen,Hanwei Zhu,Zhangkai Ni,Qi Mao,Shiqi Wang
2024-08-22
Abstract:Obtaining pairs of low/normal-light videos, with motions, is more challenging than still images, which raises technical issues and poses the technical route of unpaired learning as a critical role. This paper makes endeavors in the direction of learning for low-light video enhancement without using paired ground truth. Compared to low-light image enhancement, enhancing low-light videos is more difficult due to the intertwined effects of noise, exposure, and contrast in the spatial domain, jointly with the need for temporal coherence. To address the above challenge, we propose the Unrolled Decomposed Unpaired Network (UDU-Net) for enhancing low-light videos by unrolling the optimization functions into a deep network to decompose the signal into spatial and temporal-related factors, which are updated iteratively. Firstly, we formulate low-light video enhancement as a Maximum A Posteriori estimation (MAP) problem with carefully designed spatial and temporal visual regularization. Then, via unrolling the problem, the optimization of the spatial and temporal constraints can be decomposed into different steps and updated in a stage-wise manner. From the spatial perspective, the designed Intra subnet leverages unpair prior information from expert photography retouched skills to adjust the statistical distribution. Additionally, we introduce a novel mechanism that integrates human perception feedback to guide network optimization, suppressing over/under-exposure conditions. Meanwhile, to address the issue from the temporal perspective, the designed Inter subnet fully exploits temporal cues in progressive optimization, which helps achieve improved temporal consistency in enhancement results. Consequently, the proposed method achieves superior performance to state-of-the-art methods in video illumination, noise suppression, and temporal consistency across outdoor and indoor scenes.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively enhance the quality of low - light videos in the absence of paired training data. Specifically, the paper mainly focuses on the following aspects: 1. **Spatio - temporal consistency problem**: Compared with static images, when dealing with low - light videos, the influences of the spatial domain (degradation within each frame of the image) and the temporal domain (inter - frame consistency) need to be considered simultaneously. Due to the motion between adjacent frames in the video, directly applying the image enhancement method to each frame will lead to inconsistent situations in the enhanced video frames. 2. **Lack of paired data**: In practical applications, it is very difficult to obtain paired video data under low - light and normal - light conditions. Therefore, how to use unpaired data for learning has become an important challenge. 3. **Over - exposure or under - exposure problem**: Existing methods are prone to over - exposure or under - exposure without pixel - level supervision and human - perception feedback, thus affecting the visual quality of the enhancement results. To solve the above problems, the paper proposes a method named Unrolled Decomposed Unpaired Network (UDU - Net). This method realizes the controllable enhancement of low - light videos by unrolling the optimization function into a deep network, decomposing the signal into spatially and temporally related factors, and iteratively updating these factors. Specific technical means include: - **Maximum a posteriori (MAP) estimation**: Formulate the low - light video enhancement problem as a MAP estimation problem, and design visual regularization terms in space and time. - **Intra - sub - network**: Use unpaired high - quality image data and human - perception feedback to adjust the statistical distribution and suppress over - exposure or under - exposure. - **Inter - sub - network**: Make full use of temporal cues for progressive optimization to achieve better temporal consistency. - **Human - perception feedback mechanism**: Introduce a new mechanism to guide network optimization by integrating human - perception feedback, ensuring that the enhancement results conform to human visual habits. Through these technical means, UDU - Net can effectively increase the brightness of low - light videos, suppress noise and maintain temporal consistency without paired training data, thus achieving better performance than existing methods in both indoor and outdoor scenes.