Extracting Building Footprint From Remote Sensing Images by an Enhanced Vision Transformer Network
Hua Zhang,Hu Dou,Zelang Miao,Nanshan Zheng,Ming Hao,Wenzhong Shi
DOI: https://doi.org/10.1109/tgrs.2024.3421651
IF: 8.2
2024-07-13
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Automatic extraction of building footprints from images is one of the vital means for obtaining building footprint data. However, due to the varied appearances, scales, and intricate structures of buildings, this task still remains challenging. Recently, the vision transformer (ViT) has exhibited significant promise in semantic segmentation, thanks to its efficient capability in obtaining long-range dependencies. This article employs the ViT for extracting building footprints. Yet, utilizing ViT often encounters limitations: extensive computational costs and insufficient preservation of local details in the process of extracting features. To address these challenges, a network based on an enhanced ViT (EViT) is proposed. In this network, one convolutional neural network (CNN)-based branch is introduced to extract comprehensive spatial details. Another branch, consisting of several multiscale enhanced ViT (EV) blocks, is developed to capture global dependencies. Subsequently, a multiscale and enhanced boundary feature extraction block is developed to fuse global dependencies and local details and perform boundary features enhancement, thereby yielding multiscale global-local contextual information with enhanced boundary feature. Specifically, we present a window-based cascaded multihead self-attention (W-CMSA) mechanism, characterized by linear complexity in relation to the window size, which not only reduces computational costs but also enhances attention diversity. The EViT has undergone comprehensive evaluation alongside other state-of-the-art (SOTA) approaches using three benchmark datasets. The findings illustrate that EViT exhibits promising performance in extracting building footprints and surpasses SOTA approaches. Specifically, it achieved 82.45%, 91.76%, and 77.14% IoU on the SpaceNet, WHU, and Massachusetts datasets, respectively. The implementation of EViT is available at https://github.com/dh609/EViT.
engineering, electrical & electronic,imaging science & photographic technology,remote sensing,geochemistry & geophysics