Abstract:Event cameras offer significant advantages for low-light video enhancement, primarily due to their high dynamic range. Current research, however, is severely limited by the absence of large-scale, real-world, and spatio-temporally aligned event-video datasets. To address this, we introduce a large-scale dataset with over 30,000 pairs of frames and events captured under varying illumination. This dataset was curated using a robotic arm that traces a consistent non-linear trajectory, achieving spatial alignment precision under 0.03mm and temporal alignment with errors under 0.01s for 90% of the dataset. Based on the dataset, we propose \textbf{EvLight++}, a novel event-guided low-light video enhancement approach designed for robust performance in real-world scenarios. Firstly, we design a multi-scale holistic fusion branch to integrate structural and textural information from both images and events. To counteract variations in regional illumination and noise, we introduce Signal-to-Noise Ratio (SNR)-guided regional feature selection, enhancing features from high SNR regions and augmenting those from low SNR regions by extracting structural information from events. To incorporate temporal information and ensure temporal coherence, we further introduce a recurrent module and temporal loss in the whole pipeline. Extensive experiments on our and the synthetic SDSD dataset demonstrate that EvLight++ significantly outperforms both single image- and video-based methods by 1.37 dB and 3.71 dB, respectively. To further explore its potential in downstream tasks like semantic segmentation and monocular depth estimation, we extend our datasets by adding pseudo segmentation and depth labels via meticulous annotation efforts with foundation models. Experiments under diverse low-light scenes show that the enhanced results achieve a 15.97% improvement in mIoU for semantic segmentation.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the problem of Low - Light Video Enhancement (LLVE) under low - light conditions. Specifically, the author points out that current research mainly faces the following challenges: 1. **Lack of large - scale, real - world spatio - temporal alignment datasets**: Existing LLVE research is limited by datasets, especially the lack of datasets containing a large number of spatio - temporally aligned images and event streams under low - light and normal - light conditions. This makes it difficult for models to be effectively trained and evaluated in real - world scenarios. 2. **Poor performance of existing methods under actual low - light conditions**: Although there are many frame - based and event - guided methods for low - light enhancement, these methods still have shortcomings when dealing with complex real - world scenes, such as uneven exposure, color distortion, noise, etc. 3. **Under - utilization of the advantages of event cameras**: Event cameras have the characteristics of high dynamic range (HDR) and high temporal resolution, but they have not been fully explored and utilized in the LLVE task. To solve these problems, the author makes the following contributions: - **Construct a large - scale real - world dataset (SDE dataset)**: This dataset contains more than 30,000 pairs of spatio - temporally aligned video frames and event streams under low - light and normal - light conditions. The dataset is collected by a robotic arm system, ensuring extremely high spatial (error < 0.03mm) and temporal (error < 0.01s) alignment accuracy. - **Propose a new event - guided low - light video enhancement method (EvLight++)**: This method combines multi - scale fusion branches, SNR - guided regional feature selection strategies, recursive modules, and temporal loss functions to achieve more robust low - light video enhancement effects. - **Expand the dataset to support downstream tasks**: To verify the effectiveness of the enhancement results in practical applications, the author adds pseudo - semantic segmentation labels and depth - estimation labels to the dataset, so that the performance of the enhanced video in tasks such as semantic segmentation and monocular depth estimation can be evaluated. Through these contributions, the author hopes to promote the development of low - light video enhancement technology in practical applications and provide a strong foundation for future research.

EvLight++: Low-Light Video Enhancement with an Event Camera: A Large-Scale Real-World Dataset, Novel Method, and More

Towards Robust Event-guided Low-Light Image Enhancement: A Large-Scale Real-World Event-Image Dataset and Novel Approach

Event-Based Low-Illumination Image Enhancement

Low-Light Video Enhancement with Synthetic Event Guidance

Event-Based Fusion for Motion Deblurring with Cross-modal Attention

MEFNet: Multi-scale Event Fusion Network for Motion Deblurring

Coherent Event Guided Low-Light Video Enhancement

Towards Real-world Event-guided Low-light Video Enhancement and Deblurring

Event-guided Low-light Video Semantic Segmentation

Event-assisted Low-Light Video Object Segmentation

LED: A Large-scale Real-world Paired Dataset for Event Camera Denoising

Temporally Consistent Enhancement of Low-Light Videos via Spatial-Temporal Compatible Learning

From Sim-to-Real: Toward General Event-based Low-light Frame Interpolation with Per-scene Optimization

Seeing Motion at Nighttime with an Event Camera

EvLSD-IED: Event-Based Line Segment Detection With Image-to-Event Distillation

E2VIDX: improved bridge between conventional vision and bionic vision

Deblurring Low-Light Images with Events

DeLiEve-Net: Deblurring Low-light Images with Light Streaks and Local Events

Low-Light Video Enhancement via Spatial-Temporal Consistent Illumination and Reflection Decomposition

Attention Guided Low-Light Image Enhancement with a Large Scale Low-Light Simulation Dataset

Ultra-High-Definition Low-Light Image Enhancement: A Benchmark and Transformer-Based Method