Abstract:Urban road extraction is important for the applications of urban planning and transportation. High-resolution image (HRI) has been one of the most popular data sources for extracting roads with high efficiency and low cost. However, roads in HRI are easily obscured by buildings, trees, and other landscapes, resulting in discontinuity of the extracted roads. While current road extraction techniques by multimodal data fusion have shown improved results compared to single-modal methods by incorporating additional information, most existing fusion methods fail to fully exploit the features from different modalities and consider prior knowledge of roads. To address the above problems, a dual encoder-based cross-modal complementary fusion network (DECCFNet) is proposed in this article. The proposed network takes full advantage of the rich feature information contained in HRI and the immunity of LiDAR data to the influence of shadows. By effectively fusing the complementary information from HRI and LiDAR data, DECCFNet, respectively, achieved an improvement by at least 2.94% and 2.8% in IOU compared to those only using a single data modality on the two datasets. The proposed DECCFNet mainly contains two modules: 1) cross-modal feature fusion (CMFF) module: in the dual encoder part, CMFF is employed to fuse the deep features of different modalities from the channel and spatial dimension, while a multiscale fusion strategy is utilized to extract the contextual information; 2) multi-direction strip convolution (MDSC) module: since roads have the characteristics of narrowness and continuity, adopting classical convolution kernels directly on road features may introduce irrelevant pixels into the computation, blurring the extraction results. To mitigate this issue, MDSC is applied to strip the convolution of road features from multiple directions based on square convolution and make the network focus more on the specific road features. By comparing several deep-learning multimodal data fusion networks in the two road datasets, the proposed network exhibits the best road extraction results.

RGB‐D road segmentation based on cross‐modality feature maintenance and encouragement

NLFNet: Non-Local Fusion Towards Generalized Multimodal Semantic Segmentation Across RGB-Depth, Polarization, and Thermal Images

A Robust Road Segmentation Method

A Lightweight High-Resolution RS Image Road Extraction Method Combining Multi-Scale and Attention Mechanism

Mitigating Modality Discrepancies for RGB-T Semantic Segmentation

A duplex transform heterogeneous feature fusion network for road segmentation

Low-Visibility Vehicle-Road Environment Perception Based on the Multi-Modal Visual Features Fusion of Polarization and Infrared

MECA-Net: A MultiScale Feature Encoding and Long-Range Context-Aware Network for Road Extraction from Remote Sensing Images

ResNet-based Surface Normal Estimator with Multilevel Fusion Approach with Adaptive Median Filter Region Growth Algorithm for Road Scene Segmentation

MAFNet: Segmentation of Road Potholes With Multimodal Attention Fusion Network for Autonomous Vehicles

Rapid Detection of Blind Roads and Crosswalks by Using a Lightweight Semantic Segmentation Network

MMSMCNet: Modal Memory Sharing and Morphological Complementary Networks for RGB-T Urban Scene Semantic Segmentation

MSFANet: Multiscale Fusion Attention Network for Road Segmentation of Multispectral Remote Sensing Data

CMANet: Cross-Modality Attention Network for Indoor-Scene Semantic Segmentation

RoadFormer+: Delivering RGB-X Scene Parsing through Scale-Aware Information Decoupling and Advanced Heterogeneous Feature Fusion

DANet: A Semantic Segmentation Network for Remote Sensing of Roads Based on Dual-ASPP Structure

Multi-feature fusion and multi-attention deep network for enhancing road extraction in remote sensing images

Deep Feature-Review Transmit Network of Contour-Enhanced Road Extraction From Remote Sensing Images

MTANet: Multitask-Aware Network with Hierarchical Multimodal Fusion for RGB-T Urban Scene Understanding

A Deep Cross-Modal Fusion Network for Road Extraction With High-Resolution Imagery and LiDAR Data

MFCANet: A Road Scene Segmentation Network Based on Multi-Scale Feature Fusion and Context Information Aggregation