STransU2Net: Transformer based hybrid model for building segmentation in detailed satellite imagery

Guangjie Liu,Kuo Diao,Jinlong Zhu,Qi Wang,Meng Li
DOI: https://doi.org/10.1371/journal.pone.0299732
IF: 3.7
2024-09-12
PLoS ONE
Abstract:As essential components of human society, buildings serve a multitude of functions and significance. Convolutional Neural Network (CNN) has made remarkable progress in the task of building extraction from detailed satellite imagery, owing to the potent capability to capture local information. However, CNN performs suboptimal in extracting larger buildings. Conversely, Transformer has excelled in capturing global information through self-attention mechanisms but are less effective in capturing local information compared to CNN, resulting in suboptimal performance in extracting smaller buildings. Therefore, we have designed the hybrid model STransU2Net, which combines meticulously designed Transformer and CNN to extract buildings of various sizes. In detail, we designed a Bottleneck Pooling Block (BPB) to replace the conventional Max Pooling layer during the downsampling phase, aiming to enhance the extraction of edge information. Furthermore, we devised the Channel And Spatial Attention Block (CSAB) to enhance the target location information during the encoding and decoding stages. Additionally, we added a Swin Transformer Block (STB) at the skip connection location to enhance the model's global modeling ability. Finally, we empirically assessed the performance of STransU2Net on both the Aerial imagery and Satellite II datasets, The IoU achieved state-of-the-art results with 91.04% and 59.09%, respectively, outperforming other models.
What problem does this paper attempt to address?