Abstract:Enhancing images in low-light scenes is a challenging but widely concerned task in the computer vision. The mainstream learning-based methods mainly acquire the enhanced model by learning the data distribution from the specific scenes, causing poor adaptability (even failure) when meeting real-world scenarios that have never been encountered before. The main obstacle lies in the modeling conundrum from distribution discrepancy across different scenes. To remedy this, we first explore relationships between diverse low-light scenes based on statistical analysis, i.e., the network parameters of the encoder trained in different data distributions are close. We introduce the bilevel paradigm to model the above latent correspondence from the perspective of hyperparameter optimization. A bilevel learning framework is constructed to endow the scene-irrelevant generality of the encoder towards diverse scenes (i.e., freezing the encoder in the adaptation and testing phases). Further, we define a reinforced bilevel learning framework to provide a meta-initialization for scene-specific decoder to further ameliorate visual quality. Moreover, to improve the practicability, we establish a Retinex-induced architecture with adaptive denoising and apply our built learning framework to acquire its parameters by using two training losses including supervised and unsupervised forms. Extensive experimental evaluations on multiple datasets verify our adaptability and competitive performance against existing state-of-the-art works. The code and datasets will be available at <a class="link-external link-https" href="https://github.com/vis-opt-group/BL" rel="external noopener nofollow">this https URL</a>.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the adaptability issue of image enhancement under low - light conditions. Specifically, the existing learning - based methods mainly obtain enhancement models by learning data distributions from specific scenarios, which leads to poor adaptability and may even fail when encountering real - world scenarios that have not been encountered before. The main reason lies in the modeling difficulties caused by the distribution differences between different scenarios. To overcome this problem, the authors first explored the relationships between different low - light scenarios through statistical analysis and found that the encoder parameters trained under different data distributions are close. Based on this finding, they introduced a two - level learning framework to model the above - mentioned potential correspondence relationships from the perspective of hyper - parameter optimization, aiming to endow the encoder with universality for different scenarios and freeze the encoder parameters during the adaptation and testing stages. In addition, they also defined an enhanced two - level learning framework to provide meta - initialization for scene - specific decoders to further improve the visual quality. To improve practicality, they established an adaptive denoising architecture based on the Retinex theory and used both supervised and unsupervised training losses to obtain its parameters. Through extensive experimental evaluations on multiple datasets, the effectiveness and competitiveness of their method were verified.
### Main contributions of the paper
1. For the first time, focus on the rapid adaptation problem in low - light image enhancement from the perspective of hyper - parameter optimization, improving the adaptability to unknown scenarios.
2. Propose to use a two - level paradigm to model potential scene - independent correspondence relationships, indicating that the encoder parameters trained under different data distributions are close.
3. Design a new two - level learning framework to learn a scene - independent encoder, freeze its parameters when encountering unknown scenarios, reduce training costs, and support rapid adaptation.
4. Considering the uncertain initialization of scene - specific decoders, establish an enhanced two - level learning framework to provide meta - initialization to further improve the adaptation efficiency.
5. Construct a new encoder - decoder architecture based on the Retinex theory, including an adaptive denoising mechanism, and define two training modes, supervised and unsupervised.
### Method overview
- **Two - level learning framework**: Through a two - level optimization model, the encoder parameters are regarded as hyper - parameters, and the decoder parameters are regarded as ordinary parameters, so as to achieve rapid adaptation.
- **Enhanced two - level learning framework**: In order to solve the problem of the decoder learning from scratch, introduce a meta - initialization process to improve the decoder's adaptability to various scenarios.
- **Network architecture**: Design an enhanced architecture based on the Retinex theory and introduce an adaptive denoising mechanism to handle unknown and challenging low - light scenarios.
### Experimental results
Through experiments on multiple datasets, the best visual quality of this method in different challenging scenarios was verified, demonstrating its effectiveness and competitiveness.
### Formula explanation
- **Two - level optimization model**:
\[
\min_{u\in U}F(u, v; D_{\text{val}}),\quad\text{s.t.}\quad v\in S(u),\quad S(u):=\arg\min_v f(u, v; D_{\text{tr}})
\]
where \(u\) and \(v\) represent hyper - parameters and parameters respectively, \(S(u)\) is the solution set of the lower - level problem, \(F(\cdot)\) and \(f(\cdot)\) are the objective functions of the upper and lower levels respectively, and \(D = D_{\text{tr}}\cup D_{\text{val}}\) indicates that the given dataset is divided into a training set and a validation set.
- **Supervised loss**:
\[
L_{\text{su}}=\|z - z_{\text{gt}}\|^2
\]
where \(z\) is the reflectance map estimated by the input image passing through the encoder - decoder, and \(z_{\text{gt}}\) is the real image.
- **Unsupervised loss**:
\[
L_{\text{uns}}=\lambda\|x - y\|^2+\sum_{i = 1}^N\sum_{j\in N(i)}w_{i,j}|x_i - x_j|