GLMHA A Guided Low-rank Multi-Head Self-Attention for Efficient Image Restoration and Spectral Reconstruction

Zaid Ilyas,Naveed Akhtar,David Suter,Syed Zulqarnain Gilani
2024-10-01
Abstract:Image restoration and spectral reconstruction are longstanding computer vision tasks. Currently, CNN-transformer hybrid models provide state-of-the-art performance for these tasks. The key common ingredient in the architectural designs of these models is Channel-wise Self-Attention (CSA). We first show that CSA is an overall low-rank operation. Then, we propose an instance-Guided Low-rank Multi-Head selfattention (GLMHA) to replace the CSA for a considerable computational gain while closely retaining the original model performance. Unique to the proposed GLMHA is its ability to provide computational gain for both short and long input sequences. In particular, the gain is in terms of both Floating Point Operations (FLOPs) and parameter count reduction. This is in contrast to the existing popular computational complexity reduction techniques, e.g., Linformer, Performer, and Reformer, for whom FLOPs overpower the efficient design tricks for the shorter input sequences. Moreover, parameter reduction remains unaccounted for in the existing <a class="link-external link-http" href="http://methods.We" rel="external noopener nofollow">this http URL</a> perform an extensive evaluation for the tasks of spectral reconstruction from RGB images, spectral reconstruction from snapshot compressive imaging, motion deblurring, and image deraining by enhancing the best-performing models with our GLMHA. Our results show up to a 7.7 Giga FLOPs reduction with 370K fewer parameters required to closely retain the original performance of the best-performing models that employ CSA.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the computational efficiency in image inpainting and spectral reconstruction tasks. Specifically, the current state - of - the - art convolutional neural network (CNN) - Transformer hybrid models perform excellently in these tasks, but their core component - Channel - wise Self - Attention (CSA) still requires a large amount of computational resources. The author discovers that CSA is essentially a low - rank operation, and based on this, proposes a new mechanism - Guided Low - rank Multi - Head Attention (GLMHA) to significantly reduce the amount of computation and the number of parameters while maintaining or even approaching the performance of the original model. ### Main problems and solutions 1. **High computational complexity**: - **Current situation**: When dealing with image inpainting and spectral reconstruction tasks, the existing CSA mechanism, although effective, has a relatively high computational complexity, especially when processing long sequences. - **Solution**: By introducing GLMHA, low - rank Key and Value embeddings are generated using the input feature map, thereby reducing the computational complexity. GLMHA is not only applicable to short sequences but can also significantly reduce the number of floating - point operations (FLOPs) and parameters. 2. **Limitations of existing methods**: - **Current situation**: Existing computational complexity reduction methods (such as Linformer, Performer, etc.) perform poorly when dealing with short sequences and fail to effectively reduce the number of parameters. - **Solution**: GLMHA further optimizes the generation process of low - rank embeddings by generating calibration vectors in an instance - guided manner, enabling it to achieve good results on short sequences and reduce the number of parameters. 3. **Maintaining model performance**: - **Current situation**: While reducing computational complexity, how to maintain the performance of the model is a challenge. - **Solution**: GLMHA ensures that the performance of the original model is retained as much as possible while reducing the amount of computation and the number of parameters by introducing a lightweight calibration network. ### Formula explanation - **Formula for CSA**: \[ Q = W_Q X, \quad K = W_K X, \quad V = W_V X \] \[ Z = \text{Softmax}\left(\frac{Q \cdot K^\top}{\beta}\right) \cdot V + X \] where \(X\) and \(Z\) are the input and output features of the self - attention layer respectively, \(W_Q, W_K, W_V\) are the weight matrices used to calculate the Query, Key, and Value projections, and \(\beta\) is a learnable scaling parameter. - **Formula for GLMHA**: \[ Q = W_Q X, \quad A = \varphi_{\text{calibrate}}(Q) \] \[ X' = X+(X \odot \alpha A) \] \[ K = W_K X', \quad V = W_V X' \] \[ Z = \text{Softmax}\left(\frac{Q \cdot K^\top}{\beta}\right) \cdot V + X \] where \(A\) is the weighted vector generated by the calibration network, and \(\alpha\) is a hyperparameter used to control the influence of the weighted vector. Through the above methods, GLMHA can significantly reduce computational complexity and the number of parameters while maintaining model performance, and is suitable for tasks such as image inpainting and spectral reconstruction.