DWin-HRFormer: A High-Resolution Transformer Model With Directional Windows for Semantic Segmentation of Urban Construction Land.

Zhen Zhang,Xin Huang,Jiayi Li
DOI: https://doi.org/10.1109/TGRS.2023.3241366
2023-01-01
Abstract:In this article, a deep neural network for semantic segmentation of high-resolution remote sensing images is proposed for urban construction land classification. The network follows a high-resolution network (HRNet) architecture. Specifically, a directional self-attention on the paths of different resolutions is proposed, aiming to correct the directional bias caused by the attention of strip windows during the model learning, while also reducing the computational complexity, and allowing the model to improve both the accuracy and the speed. At the end of the network, a distributed alignment module with spatial information is constructed to train additional learnable parameters, to adjust the biased decision boundaries through a two-stage learning strategy, and alleviate the problem of accuracy degradation due to the unbalanced training data. We tested the proposed method and compared it with the current state-of-the-art (SOTA) semantic segmentation methods on the Luojia-fine-grained land cover (FGLC) dataset and the Wuhan Dense Labeling Dataset (WHDLD), and the proposed one obtained the best performance. We also verified the effectiveness of each component of the network through ablation experiments.
What problem does this paper attempt to address?