Abstract:Recent advancements in environmental perception for autonomous vehicles have been driven by deep learning-based approaches. However, effective traffic target detection in complex environments remains a challenging task. This paper presents a novel dual-modal instance segmentation deep neural network (DM-ISDNN) by merging camera and LIDAR data, which can be used to deal with the problem of target detection in complex environments efficiently based on multi-sensor data fusion. Due to the sparseness of the LIDAR point cloud data, we propose a weight assignment function that assigns different weight coefficients to different feature pyramid convolutional layers for the LIDAR sub-network. We compare and analyze the adaptations of early-, middle-, and late-stage fusion architectures in depth. By comprehensively considering the detection accuracy and detection speed, the middle-stage fusion architecture with a weight assignment mechanism, with the best performance, is selected. This work has great significance for exploring the best feature fusion scheme of a multi-modal neural network. In addition, we apply a mask distribution function to improve the quality of the predicted mask. A dual-modal traffic object instance segmentation dataset is established using a 7481 camera and LIDAR data pairs from the KITTI dataset, with 79,118 manually annotated instance masks. To the best of our knowledge, there is no existing instance annotation for the KITTI dataset with such quality and volume. A novel dual-modal dataset, composed of 14,652 camera and LIDAR data pairs, is collected using our own developed autonomous vehicle under different environmental conditions in real driving scenarios, for which a total of 62,579 instance masks are obtained using semi-automatic annotation method. This dataset can be used to validate the detection performance under complex environmental conditions of instance segmentation networks. Experimental results on the dual-modal KITTI Benchmark demonstrate that DM-ISDNN using middle-stage data fusion and the weight assignment mechanism has better detection performance than single- and dual-modal networks with other data fusion strategies, which validates the robustness and effectiveness of the proposed method. Meanwhile, compared to the state-of-the-art instance segmentation networks, our method shows much better detection performance, in terms of AP and F1 score, on the dual-modal dataset collected under complex environmental conditions, which further validates the superiority of our method.

Robust Dual-Modal Image Quality Assessment Aware Deep Learning Network for Traffic Targets Detection of Autonomous Vehicles.

NLFNet: Non-Local Fusion Towards Generalized Multimodal Semantic Segmentation Across RGB-Depth, Polarization, and Thermal Images

Millimeter-Wave Radar and Vision Fusion Target Detection Algorithm Based on an Extended Network

Deep Dual-Modal Traffic Objects Instance Segmentation Method Using Camera and LIDAR Data for Autonomous Driving.

Multiattention Mechanism 3D Object Detection Algorithm Based on RGB and LiDAR Fusion for Intelligent Driving

Multispectral Deep Neural Network Fusion Method for Low-Light Object Detection

3D Object Detection under Urban Road Traffic Scenarios Based on Dual-Layer Voxel Features Fusion Augmentation

Multi-Stage Residual Fusion Network for LIDAR-Camera Road Detection

Late sensor fusion approach with a designed multi-segmentation network

Sensor-Fused Nighttime System for Enhanced Pedestrian Detection in ADAS and Autonomous Vehicles

A Multi-Scale Feature Fusion Based Lightweight Vehicle Target Detection Network on Aerial Optical Images

Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection

Multi-View Adaptive Fusion Network for 3D Object Detection

Multi-Modal Object Detection Method Based on Dual-Branch Asymmetric Attention Backbone and Feature Fusion Pyramid Network

A Quality Index Metric and Method for Online Self-Assessment of Autonomous Vehicles Sensory Perception

DHFNet: Decoupled Hierarchical Fusion Network for RGB-T dense prediction tasks

3D Vehicle Detection Using Multi-Level Fusion From Point Clouds and Images

CrossFusion net: Deep 3D object detection based on RGB images and point clouds in autonomous driving

DADFNet: Dual Attention and Dual Frequency-Guided Dehazing Network for Video-Empowered Intelligent Transportation

Multi-Modal Fusion Based on Depth Adaptive Mechanism for 3D Object Detection

Enhanced Automotive Object Detection via RGB-D Fusion in a DiffusionDet Framework