Temporally Consistent Enhancement of Low-Light Videos via Spatial-Temporal Compatible Learning

Lingyu Zhu,Wenhan Yang,Baoliang Chen,Hanwei Zhu,Xiandong Meng,Shiqi Wang
DOI: https://doi.org/10.1007/s11263-024-02084-w
IF: 13.369
2024-05-24
International Journal of Computer Vision
Abstract:Temporal inconsistency is the annoying artifact that has been commonly introduced in low-light video enhancement, but current methods tend to overlook the significance of utilizing both data-centric clues and model-centric design to tackle this problem. In this context, our work makes a comprehensive exploration from the following three aspects. First, to enrich the scene diversity and motion flexibility, we construct a synthetic diverse low/normal-light paired video dataset with a carefully designed low-light simulation strategy, which can effectively complement existing real captured datasets. Second, for better temporal dependency utilization, we develop a Temporally Consistent Enhancer Network (TCE-Net) that consists of stacked 3D convolutions and 2D convolutions to exploit spatial-temporal clues in videos. Last, the temporal dynamic feature dependencies are exploited to obtain consistency constraints for different frame indexes. All these efforts are powered by a Spatial-Temporal Compatible Learning (STCL) optimization technique, which dynamically constructs specific training loss functions adaptively on different datasets. As such, multiple-frame information can be effectively utilized and different levels of information from the network can be feasibly integrated, thus expanding the synergies on different kinds of data and offering visually better results in terms of illumination distribution, color consistency, texture details, and temporal coherence. Extensive experimental results on various real-world low-light video datasets clearly demonstrate the proposed method achieves superior performance to state-of-the-art methods. Our code and synthesized low-light video database will be publicly available at https://github.com/lingyzhu0101/low-light-video-enhancement.git.
computer science, artificial intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to address is temporal inconsistency in low-light video enhancement. Specifically, current methods often neglect the importance of leveraging data-centric cues and model design to solve this issue when enhancing low-light videos. This results in visual artifacts such as flickering in the enhanced videos over time, which affects the viewing experience and the performance of downstream computer vision tasks. To tackle this challenge, the paper comprehensively explores the following three aspects: 1. **Dataset Construction**: To enrich scene diversity and motion flexibility, the authors constructed a synthetic low-light/normal-light paired video dataset and adopted a carefully designed low-light simulation strategy to effectively supplement existing real-shot datasets. 2. **Model Design**: To better utilize temporal dependencies, the authors developed a Temporal Consistency Enhancement Network (TCE-Net), which consists of stacked 3D convolutions and 2D convolutions to extract spatiotemporal cues in videos. 3. **Training Mechanism**: By dynamically constructing specific training loss functions, a Spatio-Temporal Compatible Learning (STCL) optimization technique was proposed, enabling effective utilization of multi-frame information and fusion of information at different network levels across various datasets. These efforts aim to expand the synergistic effects of different types of data and provide better visual effects in terms of illumination distribution, color consistency, texture details, and temporal coherence. Experimental results show that this method outperforms existing methods on various real-world low-light video datasets.