Abstract:Semantic segmentation in high-resolution remote-sensing (RS) images is a fundamental task for RS-based urban understanding and planning. However, various types of artificial objects in urban areas make this task quite challenging. Recently, the use of Deep Convolutional Neural Networks (DCNNs) with multiscale information fusion has demonstrated great potential in enhancing performance. Technically, however, existing fusions are usually implemented by summing or concatenating feature maps in a straightforward way. Seldom do works consider the spatial importance for global-to-local context-information aggregation. This paper proposes a Learnable-Gated CNN (L-GCNN) to address this issue. Methodologically, the Taylor expression of the information-entropy function is first parameterized to design the gate function, which is employed to generate pixelwise weights for coarse-to-fine refinement in the L-GCNN. Accordingly, a Parameterized Gate Module (PGM) was designed to achieve this goal. Then, the single PGM and its densely connected extension were embedded into different levels of the encoder in the L-GCNN to help identify the discriminative feature maps at different scales. With the above designs, the L-GCNN is finally organized as a self-cascaded end-to-end architecture that is able to sequentially aggregate context information for fine segmentation. The proposed model was evaluated on two public challenging benchmarks, the ISPRS 2Dsemantic segmentation challenge Potsdam dataset and the Massachusetts building dataset. The experiment results demonstrate that the proposed method exhibited significant improvement compared with several related segmentation networks, including the FCN, SegNet, RefineNet, PSPNet, DeepLab and GSN.For example, on the Potsdam dataset, our method achieved a 93.65% F1 score and 88.06% IoU score for the segmentation of tiny cars in high-resolution RS images. As a conclusion, the proposed model showed potential for object segmentation from the RS images of buildings, impervious surfaces, low vegetation, trees and cars in urban settings, which largely varies in size and have confusing appearances.

Global Multi-Attention UResNeXt for Semantic Segmentation of High-Resolution Remote Sensing Images

Multi-Attention-Network for Semantic Segmentation of Fine Resolution Remote Sensing Images

Semantic segmentation of remote sensing images combined with attention mechanism and feature enhancement U-Net

Multiattention Network for Semantic Segmentation of Fine-Resolution Remote Sensing Images

Optimizing Spatial Relationships in GCN to Improve the Classification Accuracy of Remote Sensing Images

Multi-U-Net: Residual Module under Multisensory Field and Attention Mechanism Based Optimized U-Net for VHR Image Semantic Segmentation

Gated Convolutional Neural Network for Semantic Segmentation in High-Resolution Images

Dual Path Attention Net For Remote Sensing Semantic Image Segmentation

A Multi-Attention UNet for Semantic Segmentation in Remote Sensing Images

DMAU-Net: An Attention-Based Multiscale Max-Pooling Dense Network for the Semantic Segmentation in VHR Remote-Sensing Images

CTMU-Net: an Improved U-Net for Semantic Segmentation of Remote-Sensing Images Based on the Combined Attention Mechanism

Multi-Attention-Based Semantic Segmentation Network for Land Cover Remote Sensing Images

Graph Attention Guidance Network with Knowledge Distillation for Semantic Segmentation of Remote Sensing Images.

MAENet: Multiple Attention Encoder–Decoder Network for Farmland Segmentation of Remote Sensing Images

Learnable Gated Convolutional Neural Network for Semantic Segmentation in Remote-Sensing Images

MQANet: Multi-Task Quadruple Attention Network of Multi-Object Semantic Segmentation from Remote Sensing Images

DARSegNet: A Real-Time Semantic Segmentation Method Based on Dual Attention Fusion Module and Encoder-Decoder Network

UAM-Net: an Attention-Based Multi-level Feature Fusion UNet for Remote Sensing Image Segmentation.

UNeXt: An Efficient Network for the Semantic Segmentation of High-Resolution Remote Sensing Images

Classification of Very High-Resolution Remote Sensing Imagery Using a Fully Convolutional Network with Global and Local Context Information Enhancements.

Realtime Global Attention Network for Semantic Segmentation