Abstract:The application of deep neural networks for the semantic segmentation of remote sensing images is a significant research area within the field of the intelligent interpretation of remote sensing data. The semantic segmentation of remote sensing images holds great practical value in urban planning, disaster assessment, the estimation of carbon sinks, and other related fields. With the continuous advancement of remote sensing technology, the spatial resolution of remote sensing images is gradually increasing. This increase in resolution brings about challenges such as significant changes in the scale of ground objects, redundant information, and irregular shapes within remote sensing images. Current methods leverage Transformers to capture global long-range dependencies. However, the use of Transformers introduces higher computational complexity and is prone to losing local details. In this paper, we propose UNeXt (UNet+ConvNeXt+Transformer), a real-time semantic segmentation model tailored for high-resolution remote sensing images. To achieve efficient segmentation, UNeXt uses the lightweight ConvNeXt-T as the encoder and a lightweight decoder, Transnext, which combines a Transformer and CNN (Convolutional Neural Networks) to capture global information while avoiding the loss of local details. Furthermore, in order to more effectively utilize spatial and channel information, we propose a SCFB (SC Feature Fuse Block) to reduce computational complexity while enhancing the model's recognition of complex scenes. A series of ablation experiments and comprehensive comparative experiments demonstrate that our method not only runs faster than state-of-the-art (SOTA) lightweight models but also achieves higher accuracy. Specifically, our proposed UNeXt achieves 85.2% and 82.9% mIoUs on the Vaihingen and Gaofen5 (GID5) datasets, respectively, while maintaining 97 fps for 512 × 512 inputs on a single NVIDIA GTX 4090 GPU, outperforming other SOTA methods.

Global and edge enhanced transformer for semantic segmentation of remote sensing

Local-enhanced multi-scale aggregation swin transformer for semantic segmentation of high-resolution remote sensing images

Efficient Transformer for Remote Sensing Image Segmentation

GLE-net: global-local information enhancement for semantic segmentation of remote sensing images

Class-Guided Swin Transformer for Semantic Segmentation of Remote Sensing Imagery

DESformer: A Dual-branch Encoding Strategy for Semantic Segmentation of Very-High-Resolution Remote Sensing Images Based on Feature Interaction and Multi-Scale Context Fusion

Local-Global Feature Capture and Boundary Information Refinement Swin Transformer Segmentor for Remote Sensing Images

Hybrid Attention Fusion Embedded in Transformer for Remote Sensing Image Semantic Segmentation

A transformer-based approach empowered by a self-attention technique for semantic segmentation in remote sensing

A Novel Transformer Based Semantic Segmentation Scheme for Fine-Resolution Remote Sensing Images

Transformer and CNN Hybrid Deep Neural Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery

STransFuse: Fusing Swin Transformer and Convolutional Neural Network for Remote Sensing Image Semantic Segmentation

TMNet: A Two-Branch Multi-Scale Semantic Segmentation Network for Remote Sensing Images

A Semantic Segmentation Method for Remote Sensing Images Based on the Swin Transformer Fusion Gabor Filter

SSNet: A Novel Transformer and CNN Hybrid Network for Remote Sensing Semantic Segmentation

SSDT: Scale-Separation Semantic Decoupled Transformer for Semantic Segmentation of Remote Sensing Images

Encoding Contextual Information by Interlacing Transformer and Convolution for Remote Sensing Imagery Semantic Segmentation

RemoteNet: Remote Sensing Image Segmentation Network based on Global-Local Information

Locality-Enhanced Transformer for Semantic Segmentation of High-Resolution Remote Sensing Images.

UNeXt: An Efficient Network for the Semantic Segmentation of High-Resolution Remote Sensing Images

Dual-Path Feature Aware Network for Remote Sensing Image Semantic Segmentation