Abstract:Image restoration and spectral reconstruction are longstanding computer vision tasks. Currently, CNN-transformer hybrid models provide state-of-the-art performance for these tasks. The key common ingredient in the architectural designs of these models is Channel-wise Self-Attention (CSA). We first show that CSA is an overall low-rank operation. Then, we propose an instance-Guided Low-rank Multi-Head selfattention (GLMHA) to replace the CSA for a considerable computational gain while closely retaining the original model performance. Unique to the proposed GLMHA is its ability to provide computational gain for both short and long input sequences. In particular, the gain is in terms of both Floating Point Operations (FLOPs) and parameter count reduction. This is in contrast to the existing popular computational complexity reduction techniques, e.g., Linformer, Performer, and Reformer, for whom FLOPs overpower the efficient design tricks for the shorter input sequences. Moreover, parameter reduction remains unaccounted for in the existing <a class="link-external link-http" href="http://methods.We" rel="external noopener nofollow">this http URL</a> perform an extensive evaluation for the tasks of spectral reconstruction from RGB images, spectral reconstruction from snapshot compressive imaging, motion deblurring, and image deraining by enhancing the best-performing models with our GLMHA. Our results show up to a 7.7 Giga FLOPs reduction with 370K fewer parameters required to closely retain the original performance of the best-performing models that employ CSA.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the computational efficiency in image inpainting and spectral reconstruction tasks. Specifically, the current state - of - the - art convolutional neural network (CNN) - Transformer hybrid models perform excellently in these tasks, but their core component - Channel - wise Self - Attention (CSA) still requires a large amount of computational resources. The author discovers that CSA is essentially a low - rank operation, and based on this, proposes a new mechanism - Guided Low - rank Multi - Head Attention (GLMHA) to significantly reduce the amount of computation and the number of parameters while maintaining or even approaching the performance of the original model. ### Main problems and solutions 1. **High computational complexity**: - **Current situation**: When dealing with image inpainting and spectral reconstruction tasks, the existing CSA mechanism, although effective, has a relatively high computational complexity, especially when processing long sequences. - **Solution**: By introducing GLMHA, low - rank Key and Value embeddings are generated using the input feature map, thereby reducing the computational complexity. GLMHA is not only applicable to short sequences but can also significantly reduce the number of floating - point operations (FLOPs) and parameters. 2. **Limitations of existing methods**: - **Current situation**: Existing computational complexity reduction methods (such as Linformer, Performer, etc.) perform poorly when dealing with short sequences and fail to effectively reduce the number of parameters. - **Solution**: GLMHA further optimizes the generation process of low - rank embeddings by generating calibration vectors in an instance - guided manner, enabling it to achieve good results on short sequences and reduce the number of parameters. 3. **Maintaining model performance**: - **Current situation**: While reducing computational complexity, how to maintain the performance of the model is a challenge. - **Solution**: GLMHA ensures that the performance of the original model is retained as much as possible while reducing the amount of computation and the number of parameters by introducing a lightweight calibration network. ### Formula explanation - **Formula for CSA**: \[ Q = W_Q X, \quad K = W_K X, \quad V = W_V X \] \[ Z = \text{Softmax}\left(\frac{Q \cdot K^\top}{\beta}\right) \cdot V + X \] where \(X\) and \(Z\) are the input and output features of the self - attention layer respectively, \(W_Q, W_K, W_V\) are the weight matrices used to calculate the Query, Key, and Value projections, and \(\beta\) is a learnable scaling parameter. - **Formula for GLMHA**: \[ Q = W_Q X, \quad A = \varphi_{\text{calibrate}}(Q) \] \[ X' = X+(X \odot \alpha A) \] \[ K = W_K X', \quad V = W_V X' \] \[ Z = \text{Softmax}\left(\frac{Q \cdot K^\top}{\beta}\right) \cdot V + X \] where \(A\) is the weighted vector generated by the calibration network, and \(\alpha\) is a hyperparameter used to control the influence of the weighted vector. Through the above methods, GLMHA can significantly reduce computational complexity and the number of parameters while maintaining model performance, and is suitable for tasks such as image inpainting and spectral reconstruction.

GLMHA A Guided Low-rank Multi-Head Self-Attention for Efficient Image Restoration and Spectral Reconstruction

Improved Restoration Algorithm for Weakly Blurred and Strongly Noisy Image

Lightweight Multi-Attention Fusion Network for Image Super-Resolution

An Efficient Transformer For Demosaicing Via Compressed Multi-Branch Attention Mechanism.

Efficient and Explicit Modelling of Image Hierarchies for Image Restoration

Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction

Mask-Guided Spatial–Spectral MLP Network for High-Resolution Hyperspectral Image Reconstruction

Fusiform multi-scale pixel self-attention network for hyperspectral images reconstruction from a single RGB image

An Effective Hyperspectral Image Classification Network Based on Multi-Head Self-Attention and Spectral-Coordinate Attention

GAMA-IR: Global Additive Multidimensional Averaging for Fast Image Restoration

CascadedGaze: Efficiency in Global Context Extraction for Image Restoration

Single Stage Adaptive Multi-Attention Network for Image Restoration

Cascaded Multi-Scale Attention for Enhanced Multi-Scale Feature Extraction and Interaction with Low-Resolution Images

LLMRA: Multi-modal Large Language Model based Restoration Assistant

Multi-Scale Representation Learning for Image Restoration with State-Space Model

Image Restoration Via Deep Memory-Based Latent Attention Network

Dual-former: Hybrid Self-attention Transformer for Efficient Image Restoration

Hybrid Spectral Denoising Transformer with Guided Attention

CSA: A Channel-Separated Attention Module for Enhancing MRI Reconstruction

Efficient Concertormer for Image Deblurring and Beyond

Global Learnable Attention for Single Image Super-Resolution