Abstract:Building footprint extraction is a basic task in the fields of mapping, image understanding, computer vision, and so on. Accurately and efficiently extracting building footprints from a wide range of remote sensed imagery remains a challenge due to the complex structures, variety of scales, and diverse appearances of buildings. Existing convolutional neural network (CNN)-based building extraction methods are criticized for their inability to detect tiny buildings because the spatial information of CNN feature maps is lost during repeated pooling operations of the CNN. In addition, large buildings still have inaccurate segmentation edges. Moreover, features extracted by a CNN are always partially restricted by the size of the receptive field, and large-scale buildings with low texture are always discontinuous and holey when extracted. To alleviate these problems, multiscale strategies are introduced in the latest research works to extract buildings with different scales. The features with higher resolution generally extracted from shallow layers, which extracted insufficient semantic information for tiny buildings. This article proposes a novel multiple attending path neural network (MAP-Net) for accurately extracting multiscale building footprints and precise boundaries. Unlike existing multiscale feature extraction strategies, MAP-Net learns spatial localization-preserved multiscale features through a multiparallel path in which each stage is gradually generated to extract high-level semantic features with fixed resolution. Then, an attention module adaptively squeezes the channel-wise features extracted from each path for optimized multiscale fusion, and a pyramid spatial pooling module captures global dependence for refining discontinuous building footprints. Experimental results show that our method achieved 0.88%, 0.93%, and 0.45% F1-score and 1.53%, 1.50%, and 0.82% intersection over union (IoU) score improvements without increasing computational complexity compared with the latest HRNetv2 on the Urban 3-D, Deep Globe, and WHU data sets, respectively. Specifically, MAP-Net outperforms multiscale aggregation fully convolutional network (MA-FCN), which is the state-of-the-art (SOTA) algorithms with postprocessing and model voting strategies, on the WHU data set without pretraining and postprocessing. The TensorFlow implementation is available at https://github.com/lehaifeng/MAPNet.

MAP-Net: Multiple Attending Path Neural Network for Building Footprint Extraction from Remote Sensed Imagery

Multi-scale attention integrated hierarchical networks for high-resolution building footprint extraction

MSRF-Net: Multiscale Receptive Field Network for Building Detection From Remote Sensing Images

ME-FCN: A Multi-Scale Feature-Enhanced Fully Convolutional Network for Building Footprint Extraction

HD-Net: High-resolution decoupled network for building footprint extraction via deeply supervised body and boundary decomposition

Extracting Buildings from Remote Sensing Images Using a Multitask Encoder-Decoder Network with Boundary Refinement

Cross-level and multiscale CNN-Transformer network for automatic building extraction from remote sensing imagery

Multi-Scale Attention Network for Building Extraction from High-Resolution Remote Sensing Images

A Cascaded Network With Coupled High-Low Frequency Features for Building Extraction

Multi-Scale Feature Fusion Attention Network for Building Extraction in Remote Sensing Images

Multiscale probability map guided index pooling with attention-based learning for road and building segmentation

A coarse-to-fine boundary refinement network for building footprint extraction from remote sensing imagery

Multi-Scale Feature Map Aggregation and Supervised Domain Adaptation of Fully Convolutional Networks for Urban Building Footprint Extraction

ACMFNet: Attention-Based Cross-Modal Fusion Network for Building Extraction of Remote Sensing Images

Multiregion Scale-Aware Network for Building Extraction From High-Resolution Remote Sensing Images

Asymmetric Network Combining CNN and Transformer for Building Extraction from Remote Sensing Images

Multi-Level Perceptual Network for Urban Building Extraction from High-Resolution Remote Sensing Images

MSFTrans: a multi-task frequency-spatial learning transformer for building extraction from high spatial resolution remote sensing images

CSA-Net: Complex Scenarios Adaptive Network for Building Extraction for Remote Sensing Images

Automatic building footprint extraction from very high-resolution imagery using deep learning techniques

MS-CNN: multiscale recognition of building rooftops from high spatial resolution remote sensing imagery