Deep Symmetric Fusion Transformer for Multimodal Remote Sensing Data Classification

Honghao Chang,Haixia Bi,Fan Li,Chen Xu,Jocelyn Chanussot,Danfeng Hong
DOI: https://doi.org/10.1109/tgrs.2024.3476975
IF: 8.2
2024-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:In recent years, multimodal remote sensing data classification (MMRSC) has evoked growing attention due to its more comprehensive and accurate delineation of Earth’s surface compared to its single-modal counterpart. However, it remains challenging to capture and integrate local and global features from single-modal data. Moreover, how to fully excavate and exploit the interactions between different modalities is still an intricate issue. To this end, we propose a novel dual-branch transformer-based framework named deep symmetric fusion transformer (DSymFuser). Within the framework, each branch contains a stack of local-global mixture (LGM) blocks, to extract hierarchical and discriminative single-modal features. In each LGM block, a local-global feature mixer with learnable weights is specifically devised to adaptively aggregate the local and global features extracted with a CNN-transformer network. Furthermore, we innovatively design a symmetric fusion transformer (SFT) which trails behind each LGM block. The elaborately-designed SFT facilitates cross-modal correlation excavation in a symmetric manner, comprehensively exploiting the complementary cues underlying heterogeneous modalities. The hierarchical construction of the LGM and SFT blocks enables feature extraction and fusion in a multilevel manner, further promoting the completeness and descriptiveness of the learnt features. We conducted extensive ablation studies and comparative experiments on three benchmark datasets, and the experimental results validated the effectiveness and superiority of the proposed method. The source code of the proposed method will be available publicly at https://github.com/HaixiaBi1982/DSymFuser.
What problem does this paper attempt to address?