Road Extraction by Multiscale Deformable Transformer from Remote Sensing Images

Peng-Cheng Hu,Si-Bao Chen,Li-Li Huang,Gui-Zhou Wang,Jin Tang,Bin Luo
DOI: https://doi.org/10.1109/lgrs.2023.3299985
IF: 5.343
2023-01-01
IEEE Geoscience and Remote Sensing Letters
Abstract:Rapid progress has been made in the research of high-resolution remote sensing road extraction tasks in the past years but due to the diversity of road types and the complexity of road context, extracting the perfect road network is still fraught with difficulties and challenges. Many convolutional neural networks (CNNs) based on encoder-decoder structures have demonstrated their effectiveness. Transformer's self-attention mechanism shows more powerful performance than CNNs in modeling global feature dependencies. In this letter, we propose a multiscale deformable transformer network (MDTNet) based on encoder-decoder structure to extract road networks from remote sensing images. The core of MDTNet is our proposed multiscale deformable self-attention (MDSA) mechanism. MDSA can capture more comprehensive features than conventional self-attention. In addition, roads are not present in certain blocks of areas like other objects, but are interwoven throughout the image in such a long, linear fashion that information about certain road segments may be overlooked. To minimize residual errors in road segmentations, our MDSA incorporates a deformable design on feature maps, which effectively enhances the salience of road features relative to their surroundings. Extensive experiments on several public remote sensing road datasets show that our MDTNet achieves higher segmentation [F1 score and intersection over union (IoU)] and connectivity [average path length similarity (APLS)] accuracy, which verifies the effectiveness of our approach.
What problem does this paper attempt to address?