Abstract:Deep learning (DL)-based approaches are notable for their ability to establish feature associations without relying on physical constraints, unlike traditional strategies that are complex and dependent on expert experience. However, three main challenges hinder the versatility of semantic segmentation models. First, the targets in these images are dense and exist at varying spatial scales, which imposes higher demands on the model for accurate segmentation across scales. Second, the segmentation of small targets in the images is often overlooked, leading to a compromise between fine segmentation and model efficiency. Lastly, the data-intensive nature of remote sensing images and the resource-intensive operations of large-scale networks impose significant communication and computation burdens on edge devices, which may not have sufficient resources to handle them effectively. To address these challenges, this paper proposes a lightweight semantic segmentation method for remote sensing images to achieve high-precision segmentation for multi-scale targets while maintaining low computational complexity. The main components include: (1) embedding the inverted residual block structure to minimize the number of model parameters and computational costs; (2) introducing the parallel irregular space pyramid pooling module to efficiently aggregate multi-scale contextual information for fine-grained recognition of small targets; and (3) embedding transfer learning into the encoder-decoder structure to speed up the convergence rate and improve multi-scale feature fusion capability, thereby reducing semantic information loss. The proposed lightweight method has been extensively tested on real-world high-resolution remote sensing datasets. It achieved PA, MPA, MIoU, and FWIoU scores of 87.90%, 75.76%, 66.29%, and 78.81% on the Vaihingen dataset; 87.03%, 85.31%, 74.85%, and 77.54% on the Potsdam dataset; and 95.37%, 83.33%, 75.70%, and 91.31% on the Aeroscapes dataset. Compared to other popular semantic segmentation models, the proposed method achieved the highest values in all four evaluation indicators, demonstrating its effectiveness and superiority.

L2-LiteSeg: A Real-Time Semantic Segmentation Method for End-to-End Autonomous Driving

A Scalable Real-time Semantic Segmentation Network for Autonomous Driving

Unifying Terrain Awareness Through Real-Time Semantic Segmentation

ME-Seg&DLS-Net: A Dataset and a Network for Autonomous Driving Based on Multi-Element Semantic Segmentation of Pavement

Improved 3D Semantic Segmentation Model Based on RGB Image and LiDAR Point Cloud Fusion for Automantic Driving

MLFNet: Multi-Level Fusion Network for Real-Time Semantic Segmentation of Autonomous Driving

Lite-HDSeg: LiDAR Semantic Segmentation Using Lite Harmonic Dense Convolutions

Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes

LiDAR Panoptic Segmentation for Autonomous Driving

PP-LiteSeg: A Superior Real-Time Semantic Segmentation Model

RGB and LiDAR Fusion-based 3D Semantic Segmentation for Autonomous Driving

Semantic segmentation of autonomous driving scenes based on multi-scale adaptive attention mechanism

Light-Deeplabv3+: a lightweight real-time semantic segmentation method for complex environment perception

Advancing high-resolution remote sensing: a compact and powerful approach to semantic segmentation

DARSegNet: A Real-Time Semantic Segmentation Method Based on Dual Attention Fusion Module and Encoder-Decoder Network

Implementation of a Lightweight Semantic Segmentation Algorithm in Road Obstacle Detection

On Efficient Real-Time Semantic Segmentation: A Survey

MSeg3D: Multi-modal 3D Semantic Segmentation for Autonomous Driving

Automated Evaluation of Semantic Segmentation Robustness for Autonomous Driving

OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-eye-view Vehicle Semantic Segmentation

Research on an Intelligent Driving Algorithm Based on the Double Super-Resolution Network