A multimodal hyper-fusion transformer for remote sensing image classification

Mengru Ma,Wenping Ma,Licheng Jiao,Xu Liu,Lingling Li,Zhixi Feng,Fang liu,Shuyuan Yang
DOI: https://doi.org/10.1016/j.inffus.2023.03.005
IF: 18.6
2023-03-10
Information Fusion
Abstract:The multispectral (MS) and the panchromatic (PAN) images represent complementary and synergistic spatial spectral information, how to make optimal use of the advantages of them has become a hot research topic. This paper proposes a selectable Transformer and Gist CNN network (STGC-Net). It designs a subspace similar recombination module (SSR-Module) based on non-negative matrix factorization (NMF) and the self-attention mechanism for feature decomposition. This can alleviate the redundant information of multi-modal data and extract their own singular and common features. Considering that the MS and the PAN images exhibit different advantageous properties, a selectable self-attention spectral feature extraction module ( S3 FE-Module) and a multi-stream Gist spatial feature extraction module (MGSFE-Module) are proposed for the different singular features. The former can refine the Transformer's input and simultaneously characterize the sequence information between channels for the MS image. The latter introduces the positional relationship between local features while extracting spatial features for the PAN image, thereby improving the accuracy of scene classification. Experimental results indicate that the proposed method performs better than the other methods. The relevant code of this paper is provided at: https://github.com/ru-willow/ST-GC-Net .
computer science, artificial intelligence, theory & methods
What problem does this paper attempt to address?