EvLight++: Low-Light Video Enhancement with an Event Camera: A Large-Scale Real-World Dataset, Novel Method, and More

Kanghao Chen,Guoqiang Liang,Hangyu Li,Yunfan Lu,Lin Wang
2024-08-29
Abstract:Event cameras offer significant advantages for low-light video enhancement, primarily due to their high dynamic range. Current research, however, is severely limited by the absence of large-scale, real-world, and spatio-temporally aligned event-video datasets. To address this, we introduce a large-scale dataset with over 30,000 pairs of frames and events captured under varying illumination. This dataset was curated using a robotic arm that traces a consistent non-linear trajectory, achieving spatial alignment precision under 0.03mm and temporal alignment with errors under 0.01s for 90% of the dataset. Based on the dataset, we propose \textbf{EvLight++}, a novel event-guided low-light video enhancement approach designed for robust performance in real-world scenarios. Firstly, we design a multi-scale holistic fusion branch to integrate structural and textural information from both images and events. To counteract variations in regional illumination and noise, we introduce Signal-to-Noise Ratio (SNR)-guided regional feature selection, enhancing features from high SNR regions and augmenting those from low SNR regions by extracting structural information from events. To incorporate temporal information and ensure temporal coherence, we further introduce a recurrent module and temporal loss in the whole pipeline. Extensive experiments on our and the synthetic SDSD dataset demonstrate that EvLight++ significantly outperforms both single image- and video-based methods by 1.37 dB and 3.71 dB, respectively. To further explore its potential in downstream tasks like semantic segmentation and monocular depth estimation, we extend our datasets by adding pseudo segmentation and depth labels via meticulous annotation efforts with foundation models. Experiments under diverse low-light scenes show that the enhanced results achieve a 15.97% improvement in mIoU for semantic segmentation.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problem of Low - Light Video Enhancement (LLVE) under low - light conditions. Specifically, the author points out that current research mainly faces the following challenges: 1. **Lack of large - scale, real - world spatio - temporal alignment datasets**: Existing LLVE research is limited by datasets, especially the lack of datasets containing a large number of spatio - temporally aligned images and event streams under low - light and normal - light conditions. This makes it difficult for models to be effectively trained and evaluated in real - world scenarios. 2. **Poor performance of existing methods under actual low - light conditions**: Although there are many frame - based and event - guided methods for low - light enhancement, these methods still have shortcomings when dealing with complex real - world scenes, such as uneven exposure, color distortion, noise, etc. 3. **Under - utilization of the advantages of event cameras**: Event cameras have the characteristics of high dynamic range (HDR) and high temporal resolution, but they have not been fully explored and utilized in the LLVE task. To solve these problems, the author makes the following contributions: - **Construct a large - scale real - world dataset (SDE dataset)**: This dataset contains more than 30,000 pairs of spatio - temporally aligned video frames and event streams under low - light and normal - light conditions. The dataset is collected by a robotic arm system, ensuring extremely high spatial (error < 0.03mm) and temporal (error < 0.01s) alignment accuracy. - **Propose a new event - guided low - light video enhancement method (EvLight++)**: This method combines multi - scale fusion branches, SNR - guided regional feature selection strategies, recursive modules, and temporal loss functions to achieve more robust low - light video enhancement effects. - **Expand the dataset to support downstream tasks**: To verify the effectiveness of the enhancement results in practical applications, the author adds pseudo - semantic segmentation labels and depth - estimation labels to the dataset, so that the performance of the enhanced video in tasks such as semantic segmentation and monocular depth estimation can be evaluated. Through these contributions, the author hopes to promote the development of low - light video enhancement technology in practical applications and provide a strong foundation for future research.