Abstract:The 3D object detection is becoming indispensable for environmental perception in autonomous driving. Light detection and ranging (LiDAR) point clouds often fail to distinguish objects with similar structures and are quite sparse for distant or small objects, thereby introducing false and missed detections. To address these issues, LiDAR is often fused with cameras due to the rich textural information provided by images. However, current fusion methods suffer the inefficient data representation and inaccurate alignment of heterogeneous features, leading to poor precision and low efficiency. To this end, we propose a plug-and-play module termed range-image fusion (RI-Fusion) to achieve an effective fusion of LiDAR and camera data, designed to be easily accessible by existing mainstream LiDAR-based algorithms. In this process, we design an image and point cloud alignment method by converting a point cloud into a compact range-view representation through a spherical coordinate transformation. The range image is then integrated with a corresponding camera image utilizing an attention mechanism. The original range image is then concatenated with fusion features to retain point cloud information, and the results are projected onto a spatial point cloud. Finally, the feature-enhanced point cloud can be input into a LiDAR-based 3D object detector. The results of validation experiments involving the KITTI 3D object detection benchmark showed that our proposed fusion method significantly enhanced multiple mainstream LiDAR-based 3D object detectors, PointPillars, SECOND, and Part A(2), improving the 3D mAP (mean Average Precision) by 3.61%, 2.98%, and 1.27%, respectively, particularly for small objects such as pedestrians and cyclists.

Contrastive Late Fusion for 3D Object Detection

ObjectFusion: an Object Detection and Segmentation Framework with RGB-D SLAM and Convolutional Neural Networks

GOOD: General Optimization-based Fusion for 3D Object Detection via LiDAR-Camera Object Candidates

DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection

Cascaded Cross-Modality Fusion Network for 3D Object Detection

FGFusion: Fine-Grained Lidar-Camera Fusion for 3D Object Detection

LEF: Late-to-Early Temporal Fusion for LiDAR 3D Object Detection

Explore the LiDAR-Camera Dynamic Adjustment Fusion for 3D Object Detection

FFPA-Net: Efficient Feature Fusion with Projection Awareness for 3D Object Detection

BAFusion: Bidirectional Attention Fusion for 3D Object Detection Based on LiDAR and Camera

AFMCT: adaptive fusion module based on cross-modal transformer block for 3D object detection

FS-Net: LiDAR-Camera Fusion With Matched Scale for 3D Object Detection in Autonomous Driving

RangeLVDet: Boosting 3D Object Detection in LIDAR With Range Image and RGB Image

MMLF: Multi-modal Multi-class Late Fusion for Object Detection with Uncertainty Estimation

ContrastAlign: Toward Robust BEV Feature Alignment via Contrastive Learning for Multi-Modal 3D Object Detection

Enhancing 3D object detection through multi-modal fusion for cooperative perception

Multi-Modal Fusion Based on Depth Adaptive Mechanism for 3D Object Detection

Three-Dimensional Point Cloud Object Detection Based on Feature Fusion and Enhancement

RI-Fusion: 3D Object Detection Using Enhanced Point Features With Range-Image Fusion for Autonomous Driving.

GAFusion: Adaptive Fusing LiDAR and Camera with Multiple Guidance for 3D Object Detection

CrossFusion: Interleaving Cross-modal Complementation for Noise-resistant 3D Object Detection