SSIR: Spatial Shuffle Multi-Head Self-Attention for Single Image Super-Resolution

Liangliang Zhao,Junyu Gao,Donghu Deng,Xuelong Li
DOI: https://doi.org/10.1016/j.patcog.2023.110195
IF: 8
2024-01-01
Pattern Recognition
Abstract:Benefiting from the development of deep convolutional neural networks, CNN-based single-image super-resolution methods have achieved remarkable reconstruction results. However, the limited perceptual field of the convolutional kernel and the use of static weights in the inference process limit the performance of CNN-based methods. Recently, a few vision transformer-based image super-resolution methods have achieved excellent performance compared to CNN-based methods. These methods contain many parameters and require vast amounts of GPU memory for training. In this paper, we propose a spatial shuffle multi-head self-attention for single-image super-resolution that can significantly model long-range pixel dependencies without additional computational consumption. A local perception module is also proposed to combine convolutional neural networks’ local connectivity and translational invariance. Reconstruction results on five popular benchmarks show that the proposed method outperforms existing methods in both reconstruction accuracy and visual performance. The proposed method matches the performance of transformed-based methods but requires an inferior number of transformer blocks, which reduces the number of parameters by 40%, GPU memory by 30%, and inference time by 30% compared to transformer-based methods.
What problem does this paper attempt to address?