Spatial-specific Transformer with Involution for Semantic Segmentation of High-Resolution Remote Sensing Images

Xinjia Wu,Jing Zhang,Wensheng Li,Jiafeng Li,Li Zhuo,Jie Zhang
DOI: https://doi.org/10.1080/01431161.2023.2179897
IF: 3.531
2023-01-01
International Journal of Remote Sensing
Abstract:High-resolution remote sensing images (HR-RSIs) have a strong dependency between geospatial objects and background. Considering the complex spatial structure and multiscale objects in HR-RSIs, how to fully mine spatial information directly determines the quality of semantic segmentation. In this paper, we focus on the Spatial-specific Transformer with involution for semantic segmentation of HR-RSIs. First, we integrate the spatial-specific involution branch with self-attention branch to form a Spatial-specific Transformer backbone to produce multilevel features with global and spatial information without additional parameters. Then, we introduce multiscale feature representation with large window attention into Swin Transformer to capture multiscale contextual information. Finally, we add a geospatial feature supplement branch in the semantic segmentation decoder to mitigate the loss of semantic information caused by down-sampling multiscale features of geospatial objects. Experimental results demonstrate that our method can achieve a competitive semantic segmentation performance of 87.61% and 80.08% mIoU on Potsdam and Vaihingen datasets, respectively.
What problem does this paper attempt to address?