Abstract:Image restoration represents a fundamental challenge in low-level vision, focusing on reconstructing high-quality images from their degraded counterparts. With the rapid advancement of deep learning technologies, transformer-based methods with pyramid structures have advanced the field by capturing long-range cross-scale spatial interaction. Despite its popularity, the degradation of essential features during the upsampling process notably compromised the restoration performance, resulting in suboptimal reconstruction outcomes. We introduce the EchoIR, an UNet-like image restoration network with a bilateral learnable upsampling mechanism to bridge this gap. Specifically, we proposed the Echo-Upsampler that optimizes the upsampling process by learning from the bilateral intermediate features of U-Net, the "Echo", aiming for a more refined restoration by minimizing the degradation during upsampling. In pursuit of modeling a hierarchical model of image restoration and upsampling tasks, we propose the Approximated Sequential Bi-level Optimization (AS-BLO), an advanced bi-level optimization model establishing a relationship between upsampling learning and image restoration tasks. Extensive experiments against the state-of-the-art (SOTA) methods demonstrate the proposed EchoIR surpasses the existing methods, achieving SOTA performance in image restoration tasks.
What problem does this paper attempt to address?
This paper attempts to solve two main problems in the field of image restoration:
1. **The problem of information loss during the up - sampling process**: Existing methods often lose important image features when performing up - sampling, resulting in unsatisfactory restoration effects. Especially when dealing with low - quality images, this information loss is particularly obvious, which affects the final image reconstruction quality.
2. **The optimization problem between image restoration and up - sampling tasks**: Traditional image restoration models usually regard image restoration and up - sampling as independent tasks and lack effective modeling of the relationship between them, making it difficult to achieve the optimal restoration effect.
To solve the above problems, the author proposes the EchoIR model, which includes the following innovations:
- **Introducing Echo - Upsampler**: By using the feature maps (called "echo") generated in the U - Net encoder stage, Echo - Upsampler can learn the up - sampling process more effectively, reduce information loss during the up - sampling process, and thus improve the quality of image restoration.
- **Proposing Approximated Sequential Bi - level Optimization (AS - BLO)**: AS - BLO is a new optimization strategy that transforms complex bi - level optimization problems into a series of single - level optimization problems, enabling the model to be efficiently solved by the gradient descent method. This not only simplifies the training process but also improves the overall performance of the model.
Through these innovations, the EchoIR model has achieved significant performance improvements in multiple image restoration tasks (such as rain removal, deblurring, and denoising), reaching the current state - of - the - art level.
### Key Formulas
- **Multi - Head Self - Attention Mechanism**:
\[
\text{Alpha}(Q, K)=\frac{Q\cdot K}{\sqrt{d}}
\]
\[
\text{Attention}(Q, K, V)=V\cdot\text{Softmax}(\text{Alpha}(Q, K))
\]
- **Channel Attention Mechanism**:
\[
p_w = \text{AdaptiveAvgPool}(F_{ts})
\]
\[
c_w=\text{Sigmoid}(\text{Mlp}(p_w))
\]
\[
F_{tc}=F_{ts}\otimes c_w
\]
\[
F_{ta}=F_{tc}\oplus F_t
\]
- **Weight Calculation of Echo - Upsampler**:
\[
W(p, p')=f(p, p')\oplus g(F_{\text{ref}}[p], F_{\text{ref}}[p'])
\]
\[
F_{up}[p]=\frac{1}{\sum(W(p, p'))}\sum_{p'\in\Omega}F_{down}[p']\cdot W(p, p')
\]
- **Transformation of Bi - level Optimization Problems**:
\[
\min_{\beta\in B}\min_{\omega\in\mathbb{R}^n}F(\beta, \omega; D_{\text{val}})
\]
\[
\text{s.t.}\quad\omega\in S(\beta):=\arg\min_{\omega}f(\beta, \omega; D_{\text{tr}})
\]
Through these methods, the EchoIR model can effectively perform image restoration and up - sampling while maintaining high - quality features, thereby significantly improving the effect of image restoration.