Cognitively-Inspired Multi-Scale Spectral-Spatial Transformer for Hyperspectral Image Super-Resolution

Qin Xu,Shiji Liu,Jinpei Liu,Bin Luo
DOI: https://doi.org/10.1007/s12559-023-10210-y
IF: 4.89
2023-01-01
Cognitive Computation
Abstract:The hyperspectral image (HSI) super-resolution (SR) without auxiliary high-resolution images is a challenging task in computer vision applications. The existing methods almost resort to the deep convolutional neural networks of fixed geometrical kernel, which can not model the long-range dependencies and does not conform to the human visual cognition. To address this issue, we propose the cognitively-inspired multi-scale spectral-spatial transformer for HSI SR. To solve the problem of high storage and computation burden, the overlapped band grouping strategy is adopted in light of high similarity between neighboring spectral bands of HSI. Considering the different textures and details that appear in HSIs, inspired by the human cognitive mechanism, the multi-scale spatial and spectral transformer blocks are developed which can efficiently and effectively learn the spatial and spectral feature representation at different scales and long-range dependencies of features. Finally, to fuse the feature information of neighboring groups, the 2D convolution mixed with 3D separable convolution is designed, which fully explores the complementarity and continuity of spatial and spectral information. Extensive experiments conducted on three benchmark datasets demonstrate that the proposed method yields state-of-the-art results at different scales. The effectiveness of the proposed method is verified through spatial and spectral dimension data visualization and ablation experiments. The code and models are publicly available at https://github.com/liushiji666/MMSSTN . The experimental results prove the effectiveness of our proposed method, which largely overcomes the disadvantage that convolution is ineffective for long-range dependence modeling. The method performs long-range dependence modeling on both spatial and spectral features and efficiently mines complementary information between bands, thereby enhancing the model’s high perceptual ability.
What problem does this paper attempt to address?