DDCTNet: A Deformable and Dynamic Cross-Transformer Network for Road Extraction From High-Resolution Remote Sensing Images
Lipeng Gao,Yiqing Zhou,Jiangtao Tian,Wenjing Cai
DOI: https://doi.org/10.1109/tgrs.2024.3404044
IF: 8.2
2024-06-04
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Influenced by the concepts of deep learning, extracting roads from high-resolution remote sensing scenes has gained significant attention. However, there are still limitations in both metrics and practical application scenarios. To address these limitations, we proposed a deformable and dynamic cross-transformer network (DDCTNet), introducing three key innovations. First, we employed a deformable and dynamic cross-transformer (DDCT) attention module to enhance the recovery of data and structural information during the feature map upsampling by providing rich semantic information of encoding stage to decoding stage from spatial and channel dimensions, respectively, which improved the quality of upsampling while preserving the inherent characteristics of the road. Second, we introduced a cross-scale strip-pooling axial attention (CSSA) between discontinuous encoding stages to alleviate the information loss caused by downsampling and highlight the linear characteristic of roads by leveraging rich semantic information from the previous stage, which not only considers road linear features in complex scenes but also reduces computational complexity. Finally, we designed an auxiliary head (AuxHead) by fusing the outputs from the latter three decoding modules to enhance the model's generalization performance and convergence speed. Extensive experiments were conducted on three benchmark datasets. We also compared our DDCTNet with other classic road extraction models. The results show a noticeable improvement of 1%–5% across various evaluation metrics in three datasets. In addition, the visualized results demonstrate that the proposed DDCTNet provides more accurate representations of real road scenes, including distinguishing regions with high foreground-background similarity and addressing road occlusion.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics