Transformer-induced graph reasoning for multimodal semantic segmentation in remote sensing
Qibin He,Xian Sun,Wenhui Diao,Zhiyuan Yan,Dongshuo Yin,Kun Fu
DOI: https://doi.org/10.1016/j.isprsjprs.2022.08.010
IF: 12.7
2022-11-01
ISPRS Journal of Photogrammetry and Remote Sensing
Abstract:As a large amount of earth observation data is available on a global scale, it becomes possible to apply multimodal semantic segmentation technology to remote sensing scene analysis. However, the diversity of objects in large-scale scenes and the cross-modal gap between different images are still challenging in practical applications. To address these problems, we propose a Transformer-Induced Hierarchical Graph Network (GraFNet) for multimodal semantic segmentation in remote sensing scenes, which promotes the exploration of potential intra- and inter-modal relations by introducing a new modeling paradigm. Different from existing methods, GraFNet parses multimodal remote sensing images into semantic topological graphs, and exploits the structural information of land cover categories to learn joint representations. Specifically, an attentive heterogeneous information aggregation mechanism is presented to parse diverse objects in remote sensing scenes into semantic entities, and capture modality-specific object–object interaction patterns in a topology-aware environment. In addition, modality hierarchical dependency modeling is introduced to encode the interactive representation of cross-modal objects, and distinguish the modality-specific contribution to improve cross-modal compatibility. Extensive experiments on several multimodal remote sensing datasets demonstrate that the proposed GraFNet outperforms the state-of-the-art approaches, achieving F1/mIoU accuracy 91.1%/82.4% on the ISPRS Vaihingen dataset, 93.4%/88.4% on ISPRS Potsdam dataset, and 91.8%/84.0% on the MSAW dataset.
imaging science & photographic technology,remote sensing,geography, physical,geosciences, multidisciplinary