ECAFormer: Low-light Image Enhancement using Cross Attention

Yudi Ruan,Hao Ma,Weikai Li,Xiao Wang
2024-08-20
Abstract:Low-light image enhancement (LLIE) is critical in computer vision. Existing LLIE methods often fail to discover the underlying relationships between different sub-components, causing the loss of complementary information between multiple modules and network layers, ultimately resulting in the loss of image details. To beat this shortage, we design a hierarchical mutual Enhancement via a Cross Attention transformer (ECAFormer), which introduces an architecture that enables concurrent propagation and interaction of multiple features. The model preserves detailed information by introducing a Dual Multi-head self-attention (DMSA), which leverages visual and semantic features across different scales, allowing them to guide and complement each other. Besides, a Cross-Scale DMSA block is introduced to capture the residual connection, integrating cross-layer information to further enhance image detail. Experimental results show that ECAFormer reaches competitive performance across multiple benchmarks, yielding nearly a 3% improvement in PSNR over the suboptimal method, demonstrating the effectiveness of information interaction in LLIE.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### The Problem Addressed by the Paper This paper aims to address the key issues in Low-Light Image Enhancement (LLIE). Existing methods for low-light image enhancement often fail to uncover the latent relationships between different sub-components, leading to the loss of complementary information between multiple modules and network layers, ultimately resulting in the loss of image details. To overcome this shortcoming, the authors designed a hierarchical mutual enhancement architecture based on the Cross-Attention Transformer (ECAFormer), which preserves image details by introducing a structure capable of concurrent propagation and interaction of multiple features. Specifically, ECAFormer introduces a Dual Multi-Head Self-Attention mechanism (DMSA) that leverages visual and semantic features at different scales for guidance and complementarity. Additionally, a cross-scale DMSA block is introduced to capture residual connections, integrating cross-layer information to further enhance image details. Experimental results show that ECAFormer performs excellently in multiple benchmarks, with a nearly 3% improvement in PSNR compared to the suboptimal methods, demonstrating the effectiveness of information interaction in low-light image enhancement.