Road Extraction From Remote Sensing Images via Channel Attention and Multilayer Axial Transformer

Qingliang Meng,Daoxiang Zhou,Xiaokai Zhang,Zhigang Yang,Zehua Chen
DOI: https://doi.org/10.1109/lgrs.2024.3379502
IF: 5.343
2024-04-09
IEEE Geoscience and Remote Sensing Letters
Abstract:Remote sensing images contain many objects that resemble road structures, making it difficult to distinguish roads from the background. Moreover, road extraction is affected by many factors, such as lighting conditions, noise, occlusions, etc., resulting in incomplete and discontinuous road extraction. Learning discriminative road features from remote sensing images is a highly challenging task. In this letter, a novel road extraction model is proposed for remote sensing images under encoder and decoder U-Net-like architecture. An axial Transformer module (ATM) is designed to learn global road features in the deepest layer with linear computational complexity regarding image size. A multilayer attention fusion module (MLAF) is also presented to fuse multiple layers of Transformer features, obtaining more comprehensive and richer semantic information. In the skip connection, a channel attention module (CAM) is designed to weigh the feature maps along the channel dimension, with the goal of improving the capability of feature representation. Extensive experiments are conducted on the DeepGlobe and Massachusetts road datasets. Compared with other methods, our proposed method in this letter realized road extraction from remote sensing images with higher accuracy and less computational cost, e.g., achieving an intersection over union (IoU) of 81.71% (1.02% improvement) and a 22.38% reduction in convergence time over the latest TransRoadNet on the Massachusetts road dataset. Ablation experiments also demonstrate the effectiveness of the designed model.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics
What problem does this paper attempt to address?