Multiscale Feature Learning by Transformer for Building Extraction From Satellite Images

Xin Chen,Chunping Qiu,Wenyue Guo,Anzhu Yu,Xiaochong Tong,Michael Schmitt
DOI: https://doi.org/10.1109/lgrs.2022.3142279
IF: 5.343
2022-01-01
IEEE Geoscience and Remote Sensing Letters
Abstract:Extracting buildings from very high-resolution satellite images is a challenging yet important task for applications such as urban monitoring. Multiscale feature learning proves to be a potential solution toward accurate extraction of buildings. This study exploits a powerful multiscale feature learning module, a hierarchical vision transformer by shifted windows (swin), as a backbone within a building extraction network. To this end, we first designed a general structure for building extraction, consisting of a backbone to extract multiscale features and a head network to fuse and refine features. Then, we integrated swin into the structure as a backbone and utilized channel-wise and spatial-wise enhancement in a head network. Experimental results show that our method achieves improvements regarding both F1-score and intersection over union (IoU) compared to the multiple attending path neural network (MAP-Net), which is the current state-of-the-art (SOTA) algorithm for building extraction from remote sensing images. Our study thus confirms the potential of swin transformers as backbones for semantic segmentation tasks based on satellite images.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics
What problem does this paper attempt to address?