Abstract:Currently, an increasing number of convolutional neural networks (CNNs) focus specifically on capturing contextual features (con. feat) to improve performance in semantic segmentation tasks. However, high-level con. feat are biased towards encoding features of large objects, disregard spatial details, and have a limited capacity to discriminate between easily confused classes (e.g., trees and grasses). As a result, we incorporate low-level features (low. feat) and class-specific discriminative features (dis. feat) to boost model performance further, with low. feat helping the model in recovering spatial information and dis. feat effectively reducing class confusion during segmentation. To this end, we propose a novel deep multi-feature learning framework for the semantic segmentation of VHR RSIs, dubbed MFNet. The proposed MFNet adopts a multi-feature learning mechanism to learn more complete features, including con. feat, low. feat, and dis. feat. More specifically, aside from a widely used context aggregation module for capturing con. feat, we additionally append two branches for learning low. feat and dis. feat. One focuses on learning low. feat at a shallow layer in the backbone network through local contrast processing, while the other groups con. feat and then optimizes each class individually to generate dis. feat with better inter-class discriminative capability. Extensive quantitative and qualitative evaluations demonstrate that the proposed MFNet outperforms most state-of-the-art models on the ISPRS Vaihingen and Potsdam datasets. In particular, thanks to the mechanism of multi-feature learning, our model achieves an overall accuracy score of 91.91% on the Potsdam test set with VGG16 as a backbone, performing favorably against advanced models with ResNet101.

Multi-scale attention fusion network for semantic segmentation of remote sensing images

An Attention-Fused Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery

High-Resolution Remote Sensing Image Semantic Segmentation via Multiscale Context and Linear Self-Attention

Multi-Attention-Based Semantic Segmentation Network for Land Cover Remote Sensing Images

Lightweight Attention Network for Very High-Resolution Image Semantic Segmentation

Semantic Segmentation With Attention Mechanism for Remote Sensing Images

Cascaded CNN and global–local attention transformer network-based semantic segmentation for high-resolution remote sensing image

MCAFNet: A Multiscale Channel Attention Fusion Network for Semantic Segmentation of Remote Sensing Images

Remote Sensing Image Semantic Segmentation Method Based on a Deep Convolutional Neural Network and Multiscale Feature Fusion

Hierarchical Self-Attention Embedded Neural Network With Dense Connection for Remote-Sensing Image Semantic Segmentation

Multiscale Global Context Network for Semantic Segmentation of High-Resolution Remote Sensing Images

Scale-Aware Neural Network for Semantic Segmentation of Multi-Resolution Remote Sensing Images

Encoder- and Decoder-Based Networks Using Multiscale Feature Fusion and Nonlocal Block for Remote Sensing Image Semantic Segmentation

A Multi-Step Fusion Network for Semantic Segmentation of High-Resolution Aerial Images

MAFF-HRNet: Multi-Attention Feature Fusion HRNet for Building Segmentation in Remote Sensing Images

Transformer and CNN Hybrid Deep Neural Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery

SCAttNet: Semantic Segmentation Network with Spatial and Channel Attention Mechanism for High-Resolution Remote Sensing Images

Semantic Attention and Scale Complementary Network for Instance Segmentation in Remote Sensing Images

RSI-Net: Two-Stream Deep Neural Network for Remote Sensing Images-Based Semantic Segmentation

Multi-View Feature Fusion and Rich Information Refinement Network for Semantic Segmentation of Remote Sensing Images

Semantic Segmentation of Very-High-Resolution Remote Sensing Images via Deep Multi-Feature Learning