Multi-scale attention fusion network for semantic segmentation of remote sensing images

Zhiqiang Wen,Hongxu Huang,Shuai Liu,Zhiqiang WenHongxu HuangShuai LiuSchool of Computer and Information Engineering,Central South University of Forestry and Technology,Changsha,China
DOI: https://doi.org/10.1080/01431161.2023.2290999
IF: 3.531
2023-12-12
International Journal of Remote Sensing
Abstract:In the realm of high-resolution remote sensing image (HRSI) segmentation, convolutional neural networks have shown their effectiveness and superiority. However, there are still two problems in the segmentation model that generally adopts the encoder-decoder structure in the face of HRSI: 1) Fusing high-level feature maps and low-level feature maps directly in the decoder will make spatial detail features easy to mask; 2) Although self-attention has been used to capture the long-distance dependence of features, the consumption of computing power and memory makes it have many restrictions in practical applications. Aiming at these two problems, this paper proposes a new HRSI segmentation model (named MLWNet). First, the introduction of the maximum pooling module improves the quality of the feature map and obtains the receptive field of the whole map and rich global semantic information. Then, based on a new linear complexity self-attention mechanism, we design a multi-scale linear self-attention module to abstract the correlation between contexts. Finally, the weighted feature fusion helps the feature map restore spatial details and refine the segmentation results. On the two HRSI datasets of ISPRS Potsdam and ISPRS Vaihingen, MLWNet achieved mIOU segmentation accuracy of 78.19% and 71.61%, respectively, which not only outperforms other mainstream segmentation models but also has only 17.423 M parameters. The segmentation model in this study has high precision and small parameters, which can provide decision information for real-time use of remote sensing images.
imaging science & photographic technology,remote sensing
What problem does this paper attempt to address?