Semi-LLIE: Semi-supervised Contrastive Learning with Mamba-based Low-light Image Enhancement

Guanlin Li,Ke Zhang,Ting Wang,Ming Li,Bin Zhao,Xuelong Li
2024-09-25
Abstract:Despite the impressive advancements made in recent low-light image enhancement techniques, the scarcity of paired data has emerged as a significant obstacle to further advancements. This work proposes a mean-teacher-based semi-supervised low-light enhancement (Semi-LLIE) framework that integrates the unpaired data into model training. The mean-teacher technique is a prominent semi-supervised learning method, successfully adopted for addressing high-level and low-level vision tasks. However, two primary issues hinder the naive mean-teacher method from attaining optimal performance in low-light image enhancement. Firstly, pixel-wise consistency loss is insufficient for transferring realistic illumination distribution from the teacher to the student model, which results in color cast in the enhanced images. Secondly, cutting-edge image enhancement approaches fail to effectively cooperate with the mean-teacher framework to restore detailed information in dark areas due to their tendency to overlook modeling structured information within local regions. To mitigate the above issues, we first introduce a semantic-aware contrastive loss to faithfully transfer the illumination distribution, contributing to enhancing images with natural colors. Then, we design a Mamba-based low-light image enhancement backbone to effectively enhance Mamba's local region pixel relationship representation ability with a multi-scale feature learning scheme, facilitating the generation of images with rich textural details. Further, we propose novel perceptive loss based on the large-scale vision-language Recognize Anything Model (RAM) to help generate enhanced images with richer textual details. The experimental results indicate that our Semi-LLIE surpasses existing methods in both quantitative and qualitative metrics.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in the low - light image enhancement task, due to the lack of paired data, it is difficult for existing methods to further improve their performance. Specifically, although low - light image enhancement techniques have made significant progress in recent years, it is very difficult to obtain a large number of paired image data under real low - light and normal - light conditions. This limits the application of supervised learning methods because these methods usually require a large amount of paired data to train the model. In addition, the methods of synthesizing low - light images are quite different from the low - light images in real - life scenarios, resulting in poor performance of the models trained on synthetic data in practical applications and the problem of poor generalization ability. To solve these problems, the paper proposes a semi - supervised low - light image enhancement method (Semi - LLIE) based on the mean - teacher framework. This method aims to improve the generalization ability of the model in real - life scenarios by integrating unpaired data into model training. Specific improvement measures include: 1. **Semantic - aware Contrastive Loss**: In order to more effectively transfer the real illumination distribution and reduce color deviation, the paper introduces a semantic - based contrastive loss. This method uses the intermediate representations of large - scale vision - language models (RAM) to evaluate the semantic similarity between the original low - light image and its enhanced version, thereby generating enhanced images with natural colors. 2. **Mamba - based Low - light Image Enhancement Backbone Network**: In order to better restore the detail information in dark areas, the paper designs a new multi - scale state - space block (MSSB), which enhances the ability of the Mamba model in representing pixel relationships in local areas. By combining a multi - scale feature learning scheme, this backbone network can generate images with rich texture details. 3. **RAM - based Perceptual Loss**: In order to further improve the textural details of the enhanced images, the paper proposes a new RAM - based perceptual loss function. This loss function uses the intermediate features extracted from the last three stages of the RAM pre - trained image encoder to evaluate the perceptual similarity between the two input images, thereby helping to generate more realistic texture details. Through the above innovations, Semi - LLIE outperforms existing unsupervised methods in both quantitative and qualitative indicators, and in some cases even surpasses several influential supervised methods, especially in generating enhanced images with rich local details and natural colors, further promoting the performance of downstream object detection tasks.