MFCA-Net: a deep learning method for semantic segmentation of remote sensing images

Xiujuan Li,Junhuai Li
DOI: https://doi.org/10.1038/s41598-024-56211-1
IF: 4.6
2024-03-10
Scientific Reports
Abstract:Semantic segmentation of remote sensing images (RSI) is an important research direction in remote sensing technology. This paper proposes a multi-feature fusion and channel attention network, MFCA-Net, aiming to improve the segmentation accuracy of remote sensing images and the recognition performance of small target objects. The architecture is built on an encoding–decoding structure. The encoding structure includes the improved MobileNet V2 (IMV2) and multi-feature dense fusion (MFDF). In IMV2, the attention mechanism is introduced twice to enhance the feature extraction capability, and the design of MFDF can obtain more dense feature sampling points and larger receptive fields. In the decoding section, three branches of shallow features of the backbone network are fused with deep features, and upsampling is performed to achieve the pixel-level classification. Comparative experimental results of the six most advanced methods effectively prove that the segmentation accuracy of the proposed network has been significantly improved. Furthermore, the recognition degree of small target objects is higher. For example, the proposed MFCA-Net achieves about 3.65–23.55% MIoU improvement on the dataset Vaihingen.
multidisciplinary sciences
What problem does this paper attempt to address?
The paper attempts to address several key challenges in the semantic segmentation of remote sensing images (RSI): 1. **Improving segmentation accuracy**: Existing semantic segmentation methods have low segmentation accuracy when dealing with remote sensing images, especially in the recognition of small objects. This is mainly due to the significant class imbalance in remote sensing images, where the number of samples for certain classes is much higher than for others. 2. **Enhancing the recognition performance of small objects**: Small objects in remote sensing images (such as vehicles, trees, etc.) are often difficult to accurately recognize, leading to suboptimal segmentation results. 3. **Adapting to the needs of large-scale training data**: Traditional semantic segmentation methods require manual parameter settings and perform poorly when handling large amounts of semantic information. Although deep learning methods have made some improvements in certain aspects, they still face challenges with large-scale training data. To address these issues, the paper proposes a new deep learning method—Multi-Feature Fusion and Channel Attention Network (MFCA-Net), which aims to improve the segmentation accuracy and recognition performance of small objects in remote sensing images by introducing a Multi-Feature Dense Fusion (MFDF) module and an Improved MobileNet V2 (IMV2). Specifically, MFCA-Net achieves these goals through the following approaches: - **Encoder-decoder structure**: An encoder-decoder structure is adopted, where the encoder part includes the Improved MobileNet V2 and the Multi-Feature Dense Fusion module, and the decoder part achieves pixel-level classification by fusing shallow and deep features. - **Introduction of attention mechanisms**: Attention mechanisms are introduced in both shallow and deep feature maps to enhance feature extraction capabilities. - **Multi-Feature Dense Fusion**: The MFDF module is designed to obtain broader contextual information and denser feature sampling points through convolution operations with different dilation rates and adaptive average pooling, effectively addressing class imbalance and small object recognition issues. Through these innovations, experimental results on multiple datasets show that MFCA-Net significantly outperforms existing advanced methods in terms of segmentation accuracy and small object recognition performance.