ESMS-Net: Enhancing Semantic-Mask Segmentation Network with Pyramid Atrousformer for Remote Sensing Image

Jiamin Liu,Ziyi Wang,Fulin Luo,Tan Guo,Feng Yang,Xinbo Gao
DOI: https://doi.org/10.1109/tgrs.2024.3504733
IF: 8.2
2024-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Transformers has gained widespread adoption in remote sensing image (RSI) segmentation. However, RSI has densely overlapping terrain and significant shadow, making it challenging to segment the blended boundaries of terrains that are the hard classes. Currently, most transformer-based methods construct the self-attention with a sliding window, which influences the feature receptive fields to conquer the intersecting and overlapping objects. Additionally, they often rarely focus specfically on the representation of these hard segmentation objects. To overcome these challenges, we propose a novel Enhancing Semantic Mask Segmentation Network framework (ESMS-Net) including a local-global joint encoder, an auxiliary enhanced encoder, and a multi-scale dense decoder. In the local-global joint encoder, we construct a Pyramid Pooling AtrousFormer (PPAFormer) that performs the self-attention with a pyramid-structured atrous sliding window, which enhances the range of receptive fields and the global representation performance. Meanwhile, we construct the Dual-Feature Fusion Module (DFFM) and Multi-level Feature Weighted Fusion (MFWF) in the multi-scale dense decoder to reduce information loss and facilitate the interaction of deep semantic information. For the auxiliary enhanced encoder, we develop a semantic mask based on the predicted results to maintain the hard segmentation classes, and then use the same structure as the first two stages of the local-global joint encoder to learn the hard regions again. Extensive experiments demonstrate the proposed ESMS-Net can achieve significant improvements for segmentation performance compared with the state-of-the-art methods on the ISPRS-Vaihingen and Potsdam datasets. The code will be available at https://github.com/Wzysaber/ESMS-Net.
What problem does this paper attempt to address?