Abstract:Object detection forms the foundation of safe autonomous vehicle (AV) operation. LiDAR and camera are both widely used detection devices, yet they each come with their unique advantages and drawbacks. For instance, LiDAR sensors face challenges such as obstacle occlusion and long-range object detection when applied to 3-D object recognition. On the other hand, cameras are significantly affected by variations in lighting and weather conditions, and they struggle to provide precise depth information. Hence, multisensor fusion is frequently employed to enhance both the accuracy and robustness of object detection. Prominent issues associated with end-to-end fusion include feature misalignment and suboptimal training strategies, while the challenge for the sequential fusion architecture lies in its inability to fully tap into the capabilities of high-density images to enhance point cloud data, especially when dealing with information sparsity at extended ranges. To address these challenges, we present a dense sequential fusion (DSF) framework specifically designed to fuse camera and LiDAR sensor data. The primary goal is to enhance the accuracy and robustness of 3-D object detection, particularly for distant objects. First, we developed a model for augmenting foreground points, specifically targeting sparse points associated with far-range objects. Second, a foreground points refinement technique was implemented to filter long-tail points generated by images. This refinement process has the capability to improve the object’s distinctiveness, especially when dealing with an abundance of edge points while also supplying high-resolution raw and pseudo foreground points. Finally, voxel-based LiDAR 3-D detection methods were employed to detect 3-D objects utilizing the high-resolution raw and pseudo point clouds. The experimental studies were conducted using the KITTI dataset. The results showed that the proposed method improved 3-D mAP by 2.59% compared with PointPillars and 1.27% average precision (AP) for car hard-level detection compared with SECOND. Furthermore, it improved the bird’s eye view (BEV) AP for far-range car detection by more than 10%.

FSFNet: Foreground Score-Aware Fusion for 3-D Object Detector under Unfavorable Conditions

SGFNet: Segmentation Guided Fusion Network for 3D Object Detection.

ℱ3-Net: Feature Fusion and Filtration Network for Object Detection in Optical Remote Sensing Images

Frustum FusionNet: Amodal 3D Object Detection with Multi-Modal Feature Fusion

FS-Net: LiDAR-Camera Fusion With Matched Scale for 3D Object Detection in Autonomous Driving

Cascaded Cross-Modality Fusion Network for 3D Object Detection

FFPA-Net: Efficient Feature Fusion with Projection Awareness for 3D Object Detection

LiDAR-Camera Cross Fusion Network Towards 3D Object Detection in Self-Driving

MFF-Net: Multimodal Feature Fusion Network for 3D Object Detection

Sparse Fuse Dense: Towards High Quality 3D Detection with Depth Completion

TSFF: a two-stage fusion framework for 3D object detection

Three-Dimensional Object Detection Network Based on Multi-Layer and Multi-Modal Fusion

ACF-Net: Asymmetric Cascade Fusion for 3D Detection with LiDAR Point Clouds and Images

MSGFusion: Muti-scale Semantic Guided LiDAR-Camera Fusion for 3D Object Detection

MLF3D: Multi-Level Fusion for Multi-Modal 3D Object Detection

MMAF-Net: Multi-view multi-stage adaptive fusion for multi-sensor 3D object detection

Dense Frustum-Aware Fusion for 3D Object Detection in Perception Systems

TSF: Two-Stage Sequential Fusion for 3D Object Detection

Dense Sequential Fusion: Point Cloud Enhancement Using Foreground Mask Guidance for Multimodal 3-D Object Detection

Towards Efficient Multi-Modal 3D Object Detection: Homogeneous Sparse Fuse Network

FSFM: A Feature Square Tower Fusion Module for Multimodal Object Detection.