Abstract:Deep learning (DL)-based approaches are notable for their ability to establish feature associations without relying on physical constraints, unlike traditional strategies that are complex and dependent on expert experience. However, three main challenges hinder the versatility of semantic segmentation models. First, the targets in these images are dense and exist at varying spatial scales, which imposes higher demands on the model for accurate segmentation across scales. Second, the segmentation of small targets in the images is often overlooked, leading to a compromise between fine segmentation and model efficiency. Lastly, the data-intensive nature of remote sensing images and the resource-intensive operations of large-scale networks impose significant communication and computation burdens on edge devices, which may not have sufficient resources to handle them effectively. To address these challenges, this paper proposes a lightweight semantic segmentation method for remote sensing images to achieve high-precision segmentation for multi-scale targets while maintaining low computational complexity. The main components include: (1) embedding the inverted residual block structure to minimize the number of model parameters and computational costs; (2) introducing the parallel irregular space pyramid pooling module to efficiently aggregate multi-scale contextual information for fine-grained recognition of small targets; and (3) embedding transfer learning into the encoder-decoder structure to speed up the convergence rate and improve multi-scale feature fusion capability, thereby reducing semantic information loss. The proposed lightweight method has been extensively tested on real-world high-resolution remote sensing datasets. It achieved PA, MPA, MIoU, and FWIoU scores of 87.90%, 75.76%, 66.29%, and 78.81% on the Vaihingen dataset; 87.03%, 85.31%, 74.85%, and 77.54% on the Potsdam dataset; and 95.37%, 83.33%, 75.70%, and 91.31% on the Aeroscapes dataset. Compared to other popular semantic segmentation models, the proposed method achieved the highest values in all four evaluation indicators, demonstrating its effectiveness and superiority.

A ViT-Based Multiscale Feature Fusion Approach for Remote Sensing Image Segmentation

Remote Sensing Image Segmentation Using Vision Mamba and Multi-Scale Multi-Frequency Feature Fusion

Remote Sensing Image Semantic Segmentation Method Based on a Deep Convolutional Neural Network and Multiscale Feature Fusion

A Multilevel Multimodal Fusion Transformer for Remote Sensing Semantic Segmentation

A Transformer-based Multi-Modal Fusion Network for Semantic Segmentation of High-Resolution Remote Sensing Imagery

Elevation Information-Guided Multimodal Fusion Robust Framework for Remote Sensing Image Segmentation

Encoder- and Decoder-Based Networks Using Multiscale Feature Fusion and Nonlocal Block for Remote Sensing Image Semantic Segmentation

Dual-Domain Fusion Network Based on Wavelet Frequency Decomposition and Fuzzy Spatial Constraint for Remote Sensing Image Segmentation

Multi-Scale Feature Fusion Based on PVTv2 for Deep Hash Remote Sensing Image Retrieval

A Crossmodal Multiscale Fusion Network for Semantic Segmentation of Remote Sensing Data

Enhanced Multi-Scale Feature Adaptive Fusion Sparse Convolutional Network for Large-Scale Scenes Semantic Segmentation

MFVNet: a deep adaptive fusion network with multiple field-of-views for remote sensing image semantic segmentation

MFEAFN: Multi-scale feature enhanced adaptive fusion network for image semantic segmentation

DESformer: A Dual-branch Encoding Strategy for Semantic Segmentation of Very-High-Resolution Remote Sensing Images Based on Feature Interaction and Multi-Scale Context Fusion

Advancing high-resolution remote sensing: a compact and powerful approach to semantic segmentation

Hybrid Attention Fusion Embedded in Transformer for Remote Sensing Image Semantic Segmentation

CNN and Transformer Fusion for Remote Sensing Image Semantic Segmentation

PyramidMamba: Rethinking Pyramid Feature Fusion with Selective Space State Model for Semantic Segmentation of Remote Sensing Imagery

Multi-View Feature Fusion and Rich Information Refinement Network for Semantic Segmentation of Remote Sensing Images

An Attention-Fused Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery

MsVFE and V-SIAM: Attention-based multi-scale feature interaction and fusion for outdoor LiDAR semantic segmentation