Abstract:In this paper, we propose a 3D object detection method called MMAF-Net that is based on the multi-view and multi-stage adaptive fusion of RGB images and LiDAR point cloud data. This is an end-to-end architecture, which combines the characteristics of RGB images, the front view of point clouds based on reflection intensity, and the bird's eye view of point clouds. It also adopts a multi-stage fusion approach of "data-level fusion + feature-level fusion" to fully exploit the strength of multimodal information. Our proposed method addresses key challenges found in current 3D object detection methods for autonomous driving, including insufficient feature extraction from multimodal data, rudimentary fusion techniques, and sensitivity to distance and occlusion. To ensure the comprehensive integration of multimodal information, we present a series of targeted fusion methods. Firstly, we propose a novel input form that encodes dense point cloud reflectivity information into the image to enhance its representational power. Secondly, we design the Region Attention Adaptive Fusion module utilizing an attention mechanism to guide the network in adaptively adjusting the importance of different features. Finally, we extend the 2D DIOU (Distance Intersection over Union) loss function to 3D and develop a joint regression loss based on 3D_DIOU and SmoothL1 to optimize the similarity between detected and ground truth boxes. The experimental results on the KITTI dataset demonstrate that MMAF-Net effectively addresses the challenges posed by highly obscured or crowded scenes while maintaining real-time performance and improving the detection accuracy of smaller and more difficult objects that are occluded at far distances.

Multi-view 3D Object Detection Network for Autonomous Driving

Multi-View 3D Object Detection Network for Autonomous Driving

A Multi-view 3D Vehicle Detection Method Based On Novel 3D Proposal Generation Method

Multi-View Adaptive Fusion Network for 3D Object Detection

Monocular 3-D Vehicle Detection Using a Cascade Network for Autonomous Driving

Deep multi-scale and multi-modal fusion for 3D object detection

End-to-End Multi-View Fusion for 3D Object Detection in LiDAR Point Clouds

Stereo RGB and Deeper LIDAR Based Network for 3D Object Detection

3M3D: Multi-view, Multi-path, Multi-representation for 3D Object Detection

MMAF-Net: Multi-view multi-stage adaptive fusion for multi-sensor 3D object detection

Accurate Monocular Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving

Multiattention Mechanism 3D Object Detection Algorithm Based on RGB and LiDAR Fusion for Intelligent Driving

MSL3D: 3D object detection from monocular, stereo and point cloud for autonomous driving

Multi-Modal 3D Object Detection by Box Matching

3D Object Detection for Point Cloud in Virtual Driving Environment

Channelwise and Spatially Guided Multimodal Feature Fusion Network for 3-D Object Detection in Autonomous Vehicles

Multi-View Fusion of Sensor Data for Improved Perception and Prediction in Autonomous Driving

Ground-aware Monocular 3D Object Detection for Autonomous Driving

3D Vehicle Detection Using Multi-Level Fusion From Point Clouds and Images

Multi-sensor fusion 3D object detection for autonomous driving

Enhancing 3D object detection through multi-modal fusion for cooperative perception