CMPF-UNet: a ConvNeXt multi-scale pyramid fusion U-shaped network for multi-category segmentation of remote sensing images

Ning Li,Xiaopeng Yu,Miao Yu
DOI: https://doi.org/10.1080/10106049.2024.2311217
IF: 3.45
2024-02-16
Geocarto International
Abstract:Most U-shaped convolutional neural network (CNN) methods have the problems of insufficient feature extraction and fail to fully utilize global/multi-scale context information, which makes it difficult to distinguish similar objects and shadow occluded objects in remote sensing images. This article proposes a ConvNeXt multi-scale pyramid fusion U-shaped network (CMPF-UNet). In this work, we first propose a novel backbone network based on ConvNeXt to enhance image feature extraction, and use ConvNeXt bottleneck blocks to reconstruct the decoder. Furthermore, a scale aware pyramid fusion (SAPF) module and Residual Atrous Spatial Pyramid Pooling (RASPP) module are proposed to dynamically fuse the rich multi-scale context information in advanced features. Finally, multiple Global Pyramid Guidance (GPG) modules are embedded in the network, aiming to provide different levels of global context information for the decoder by reconstructing skip-connections. Experiments on the Vaihingen and Potsdam datasets indicate that the proposed CMPF-UNet segmentation achieves more accurate results.
geosciences, multidisciplinary,environmental sciences,remote sensing,imaging science & photographic technology
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address several key issues in semantic segmentation of high-resolution remote sensing images, particularly the challenges faced in distinguishing ground object categories with similar textures or those obscured by shadows. Specifically: 1. **Insufficient Feature Extraction**: - Most methods based on U-shaped Convolutional Neural Networks (CNNs) lack sufficient capability in feature extraction and fail to fully utilize global/multi-scale contextual information, making it difficult to distinguish similar objects and shadowed objects in remote sensing images. 2. **Improving Segmentation Performance**: - A new ConvNeXt Multi-scale Pyramid Fusion U-shaped Network (CMPF-UNet) is proposed to improve the segmentation accuracy of high-resolution remote sensing images. 3. **Multi-scale Contextual Information Fusion**: - By introducing the Global Pyramid Guidance (GPG) module, the improved Scale-Aware Pyramid Fusion (SAPF) module, and the Residual Atrous Spatial Pyramid Pooling (RASPP) module, multi-scale contextual information is effectively fused. 4. **Addressing Limitations of Traditional Methods**: - Traditional image segmentation methods rely on manually designed features and cannot achieve high precision and full automation. While CNN-based deep learning methods have advantages in this regard, they still have some shortcomings, such as the loss of edge and contour details. Through these improvements, CMPF-UNet aims to enhance the segmentation performance of high-resolution remote sensing images, performing better in scenarios with similar materials and shadow occlusions. Experimental results show that this method achieves highly competitive performance on the Vaihingen and Potsdam datasets.