Abstract:Outdoor images often suffer from severe degradation due to rain, haze, and noise, impairing image quality and challenging high-level tasks. Current image restoration methods struggle to handle complex degradation while maintaining efficiency. This paper introduces a novel image restoration architecture that combines multi-dimensional dynamic attention and self-attention within a U-Net framework. To leverage the global modeling capabilities of transformers and the local modeling capabilities of convolutions, we integrate sole CNNs in the encoder-decoder and sole transformers in the latent layer. Additionally, we design convolutional kernels with selected multi-dimensional dynamic attention to capture diverse degraded inputs efficiently. A transformer block with transposed self-attention further enhances global feature extraction while maintaining efficiency. Extensive experiments demonstrate that our method achieves a better balance between performance and computational complexity across five image restoration tasks: deraining, deblurring, denoising, dehazing, and enhancement, as well as superior performance for high-level vision tasks. The source code will be available at <a class="link-external link-https" href="https://github.com/House-yuyu/MDDA-former" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to improve the ability to handle complex degradations while maintaining high efficiency when dealing with image restoration tasks. Specifically, the paper focuses on how to achieve a good balance between performance and computational complexity in general - purpose image restoration tasks by combining the multi - dimensional dynamic attention mechanism and the Transformer. These problems include: 1. **Complex Degradation Processing**: Outdoor images are often severely degraded due to factors such as rain, fog, and noise, which affect image quality and subsequent high - level tasks. Current image restoration methods face challenges in handling these complex degradations, especially in maintaining efficiency. 2. **Balance between Performance and Computational Complexity**: Existing image restoration methods either perform poorly in terms of performance or have too high computational complexity, making it difficult to be widely used in practical applications. Therefore, a new architecture is required to simultaneously improve performance and reduce computational complexity. 3. **Multi - task Processing Ability**: The paper also focuses on how to effectively handle multiple image degradation problems, such as rain removal, deblurring, denoising, defogging, and enhancement, in one model. ### Main Contributions of the Paper 1. **Proposed a New Image Restoration Architecture MDDA - former**: This architecture makes full use of the multi - scale structural differences of the U - Net architecture by using CNN - based modules in the encoder - decoder part and Transformer blocks in the latent layer, achieving a good balance between performance and efficiency. 2. **Designed the Multi - Dimensional Dynamic Attention Block (MDAB)**: This block can learn the dynamic complementary attention in the three dimensions of space, channel, and filter of the convolution kernel under acceptable computational complexity, thereby effectively extracting rich local context information. 3. **Proposed an Effective Transformer Block (ETB)**: This block effectively captures global context information through the transposed self - attention mechanism and depth convolution with linear complexity, while maintaining low model parameters and FLOPs. 4. **Experimental Verification**: A large number of experimental results show that the proposed method achieves a better trade - off between performance and complexity in five image restoration tasks (rain removal, deblurring, denoising, defogging, and enhancement), as well as on 18 benchmark datasets, and also performs well in high - level visual tasks. ### Formula Presentation - **Multi - Dimensional Dynamic Convolution (MDConv)**: \[ Y = W_d\ast X \] \[ W_d = W\odot\alpha_s\odot\alpha_c\odot\alpha_f \] \[ \alpha_s, \alpha_c, \alpha_f=\pi(X) \] where \(X\in\mathbb{R}^{h\times w\times C_{\text{in}}}\) is the input, \(Y\in\mathbb{R}^{h\times w\times C_{\text{out}}}\) is the output, \(W\) is the regular (static) convolution kernel, \(\alpha_s\in\mathbb{R}^{k\times k}\), \(\alpha_c\in\mathbb{R}^{C_{\text{in}}}\), \(\alpha_f\in\mathbb{R}^{C_{\text{out}}}\) represent the attention weights in the three dimensions of space, channel, and filter respectively, and \(\odot\) and \(\ast\) represent the element - wise multiplication and convolution operations respectively. - **Effective Transformer Block (ETB)**: \[ Q, K, V = f_{dw}^{3\times3}(f_1^{1\times1}(\text{LN}(X_e))) \] \[ \hat{Q}, \hat{K}, \hat{V}=R(Q, K, V) \] \[ \text{FTSA}=\text{SoftMax}(\hat{K}\otimes\hat{Q}/\alpha)

Joint multi-dimensional dynamic attention and transformer for general image restoration

Vision Transformers for Single Image Dehazing

Dual-former: Hybrid Self-attention Transformer for Efficient Image Restoration

An Efficient Dehazing Algorithm Based on the Fusion of Transformer and Convolutional Neural Network.

An efficient multi‐scale transformer for satellite image dehazing

HAT: Hybrid Attention Transformer for Image Restoration

Multi-Scale Fusion and Decomposition Network for Single Image Deraining

Uformer: A General U-Shaped Transformer for Image Restoration

Restorer: Removing Multi-Degradation with All-Axis Attention and Prompt Guidance

Dynamic Association Learning of Self-Attention and Convolution in Image Restoration

Correlation Matching Transformation Transformers for UHD Image Restoration

Look-Around Before You Leap: High-Frequency Injected Transformer for Image Restoration

DMTNet: Dynamic Multi-scale Network for Dual-pixel Images Defocus Deblurring with Transformer

Dual-domain strip attention for image restoration

U²-Former: Nested U-Shaped Transformer for Image Restoration Via Multi-View Contrastive Learning

Restormer: Efficient Transformer for High-Resolution Image Restoration

Empowering Image Recovery_ A Multi-Attention Approach

Adapt or Perish: Adaptive Sparse Transformer with Attentive Feature Refinement for Image Restoration

A Dynamic Network with Transformer for Image Denoising

UTDM: a universal transformer-based diffusion model for multi-weather-degraded images restoration

Progressive Convolutional Transformer for Image Restoration