Abstract:Urban building segmentation is a prevalent research domain for very high resolution (VHR) remote sensing; however, various appearances and complicated background of VHR remote sensing imagery make accurate semantic segmentation of urban buildings a challenge in relevant applications. Following the basic architecture of U-Net, an end-to-end deep convolutional neural network (denoted as DeepResUnet) was proposed, which can effectively perform urban building segmentation at pixel scale from VHR imagery and generate accurate segmentation results. The method contains two sub-networks: One is a cascade down-sampling network for extracting feature maps of buildings from the VHR image, and the other is an up-sampling network for reconstructing those extracted feature maps back to the same size of the input VHR image. The deep residual learning approach was adopted to facilitate training in order to alleviate the degradation problem that often occurred in the model training process. The proposed DeepResUnet was tested with aerial images with a spatial resolution of 0.075 m and was compared in performance under the exact same conditions with six other state-of-the-art networks—FCN-8s, SegNet, DeconvNet, U-Net, ResUNet and DeepUNet. Results of extensive experiments indicated that the proposed DeepResUnet outperformed the other six existing networks in semantic segmentation of urban buildings in terms of visual and quantitative evaluation, especially in labeling irregular-shape and small-size buildings with higher accuracy and entirety. Compared with the U-Net, the F1 score, Kappa coefficient and overall accuracy of DeepResUnet were improved by 3.52%, 4.67% and 1.72%, respectively. Moreover, the proposed DeepResUnet required much fewer parameters than the U-Net, highlighting its significant improvement among U-Net applications. Nevertheless, the inference time of DeepResUnet is slightly longer than that of the U-Net, which is subject to further improvement.

Semantic scene segmentation for indoor autonomous vision systems: leveraging an enhanced and efficient U-NET architecture

EHANet: Efficient Hybrid Attention Network Towards Real-time Semantic Segmentation

Unifying Terrain Awareness Through Real-Time Semantic Segmentation

In Defense Of Multi-Source Omni-Supervised Efficient Convnet For Robust Semantic Segmentation In Heterogeneous Unseen Domains

ACNET: Attention Based Network to Exploit Complementary Features for RGBD Semantic Segmentation.

Ghost-UNet: An Asymmetric Encoder-Decoder Architecture for Semantic Segmentation From Scratch

Dual Attention U-Net with Feature Infusion: Pushing the Boundaries of Multiclass Defect Segmentation

Efficient Multi-scale Network for Semantic Segmentation of fine-Resolution Remotely Sensed Images

Semantic segmentation of urban environments: Leveraging U-Net deep learning model for cityscape image analysis

U-Net Ensemble for Enhanced Semantic Segmentation in Remote Sensing Imagery

EBUNet: a fast and accurate semantic segmentation network with lightweight efficient bottleneck unit

ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data

DEUFormer: High‐precision semantic segmentation for urban remote sensing images

Interactive Efficient Multi-Task Network for RGB-D Semantic Segmentation

EAD-Net: Efficiently Asymmetric Network for Semantic Labeling of High-Resolution Remote Sensing Images with Dynamic Routing Mechanism

Semantic Segmentation of Urban Buildings from VHR Remote Sensing Imagery Using a Deep Convolutional Neural Network

Spatial-information Guided Adaptive Context-aware Network for Efficient RGB-D Semantic Segmentation

Semantic Segmentation for Urban-Scene Images

Research on Efficient Asymmetric Attention Module for Real-Time Semantic Segmentation Networks in Urban Scenes

UNeXt: An Efficient Network for the Semantic Segmentation of High-Resolution Remote Sensing Images

Building and Road Segmentation Using EffUNet and Transfer Learning Approach