Abstract:Semantic segmentation of remote sensing images (RSI) is an important research direction in remote sensing technology. This paper proposes a multi-feature fusion and channel attention network, MFCA-Net, aiming to improve the segmentation accuracy of remote sensing images and the recognition performance of small target objects. The architecture is built on an encoding–decoding structure. The encoding structure includes the improved MobileNet V2 (IMV2) and multi-feature dense fusion (MFDF). In IMV2, the attention mechanism is introduced twice to enhance the feature extraction capability, and the design of MFDF can obtain more dense feature sampling points and larger receptive fields. In the decoding section, three branches of shallow features of the backbone network are fused with deep features, and upsampling is performed to achieve the pixel-level classification. Comparative experimental results of the six most advanced methods effectively prove that the segmentation accuracy of the proposed network has been significantly improved. Furthermore, the recognition degree of small target objects is higher. For example, the proposed MFCA-Net achieves about 3.65–23.55% MIoU improvement on the dataset Vaihingen.

What problem does this paper attempt to address?

The paper attempts to address several key challenges in the semantic segmentation of remote sensing images (RSI): 1. **Improving segmentation accuracy**: Existing semantic segmentation methods have low segmentation accuracy when dealing with remote sensing images, especially in the recognition of small objects. This is mainly due to the significant class imbalance in remote sensing images, where the number of samples for certain classes is much higher than for others. 2. **Enhancing the recognition performance of small objects**: Small objects in remote sensing images (such as vehicles, trees, etc.) are often difficult to accurately recognize, leading to suboptimal segmentation results. 3. **Adapting to the needs of large-scale training data**: Traditional semantic segmentation methods require manual parameter settings and perform poorly when handling large amounts of semantic information. Although deep learning methods have made some improvements in certain aspects, they still face challenges with large-scale training data. To address these issues, the paper proposes a new deep learning method—Multi-Feature Fusion and Channel Attention Network (MFCA-Net), which aims to improve the segmentation accuracy and recognition performance of small objects in remote sensing images by introducing a Multi-Feature Dense Fusion (MFDF) module and an Improved MobileNet V2 (IMV2). Specifically, MFCA-Net achieves these goals through the following approaches: - **Encoder-decoder structure**: An encoder-decoder structure is adopted, where the encoder part includes the Improved MobileNet V2 and the Multi-Feature Dense Fusion module, and the decoder part achieves pixel-level classification by fusing shallow and deep features. - **Introduction of attention mechanisms**: Attention mechanisms are introduced in both shallow and deep feature maps to enhance feature extraction capabilities. - **Multi-Feature Dense Fusion**: The MFDF module is designed to obtain broader contextual information and denser feature sampling points through convolution operations with different dilation rates and adaptive average pooling, effectively addressing class imbalance and small object recognition issues. Through these innovations, experimental results on multiple datasets show that MFCA-Net significantly outperforms existing advanced methods in terms of segmentation accuracy and small object recognition performance.

MFCA-Net: a deep learning method for semantic segmentation of remote sensing images

MCAFNet: A Multiscale Channel Attention Fusion Network for Semantic Segmentation of Remote Sensing Images

Semantic Segmentation of Very-High-Resolution Remote Sensing Images via Deep Multi-Feature Learning

MFVNet: a deep adaptive fusion network with multiple field-of-views for remote sensing image semantic segmentation

An Attention-Fused Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery

Multi-View Feature Fusion and Rich Information Refinement Network for Semantic Segmentation of Remote Sensing Images

Encoder- and Decoder-Based Networks Using Multiscale Feature Fusion and Nonlocal Block for Remote Sensing Image Semantic Segmentation

Enhanced Multi-Scale Feature Adaptive Fusion Sparse Convolutional Network for Large-Scale Scenes Semantic Segmentation

LMFNet: An Efficient Multimodal Fusion Approach for Semantic Segmentation in High-Resolution Remote Sensing

Deep Multimodal Fusion Network for Semantic Segmentation Using Remote Sensing Image and LiDAR Data

A Crossmodal Multiscale Fusion Network for Semantic Segmentation of Remote Sensing Data

Lightweight Attention Network for Very High-Resolution Image Semantic Segmentation

Mask-R-FCN: A Deep Fusion Network for Semantic Segmentation.

Remote Sensing Image Semantic Segmentation Method Based on a Deep Convolutional Neural Network and Multiscale Feature Fusion

MCFNet: Multi-scale Covariance Feature Fusion Network for Real-time Semantic Segmentation

A Transformer-based Multi-Modal Fusion Network for Semantic Segmentation of High-Resolution Remote Sensing Imagery

RSFNet: a method for remote sensing image semantic segmentation based on fully convolutional neural networks

Multi-scale attention fusion network for semantic segmentation of remote sensing images

SFMRNet: Specific Feature Fusion and Multibranch Feature Refinement Network for Land Use Classification

Remote sensing image semantic segmentation method based on small target and edge feature enhancement

Scale-Aware Neural Network for Semantic Segmentation of Multi-Resolution Remote Sensing Images