Abstract:Object detection forms the foundation of safe autonomous vehicle (AV) operation. LiDAR and camera are both widely used detection devices, yet they each come with their unique advantages and drawbacks. For instance, LiDAR sensors face challenges such as obstacle occlusion and long-range object detection when applied to 3-D object recognition. On the other hand, cameras are significantly affected by variations in lighting and weather conditions, and they struggle to provide precise depth information. Hence, multisensor fusion is frequently employed to enhance both the accuracy and robustness of object detection. Prominent issues associated with end-to-end fusion include feature misalignment and suboptimal training strategies, while the challenge for the sequential fusion architecture lies in its inability to fully tap into the capabilities of high-density images to enhance point cloud data, especially when dealing with information sparsity at extended ranges. To address these challenges, we present a dense sequential fusion (DSF) framework specifically designed to fuse camera and LiDAR sensor data. The primary goal is to enhance the accuracy and robustness of 3-D object detection, particularly for distant objects. First, we developed a model for augmenting foreground points, specifically targeting sparse points associated with far-range objects. Second, a foreground points refinement technique was implemented to filter long-tail points generated by images. This refinement process has the capability to improve the object’s distinctiveness, especially when dealing with an abundance of edge points while also supplying high-resolution raw and pseudo foreground points. Finally, voxel-based LiDAR 3-D detection methods were employed to detect 3-D objects utilizing the high-resolution raw and pseudo point clouds. The experimental studies were conducted using the KITTI dataset. The results showed that the proposed method improved 3-D mAP by 2.59% compared with PointPillars and 1.27% average precision (AP) for car hard-level detection compared with SECOND. Furthermore, it improved the bird’s eye view (BEV) AP for far-range car detection by more than 10%.

ODSPC: deep learning-based 3D object detection using semantic point cloud

Leveraging Front and Side Cues for Occlusion Handling in Monocular 3D Object Detection

Pass3d: Precise And Accelerated Semantic Segmentation For 3d Point Cloud

MS23D: A 3D object detection method using multi-scale semantic feature points to construct 3D feature layer

FusionPainting: Multimodal Fusion with Adaptive Attention for 3D Object Detection

Improving Point Cloud Semantic Segmentation by Learning 3D Object Detection

Three-Dimensional Point Cloud Object Detection Based on Feature Fusion and Enhancement

Multi-Sem Fusion: Multimodal Semantic Fusion for 3D Object Detection

Transfer Learning Based Semantic Segmentation for 3D Object Detection from Point Cloud

Multi-Sem Fusion: Multimodal Semantic Fusion for 3-D Object Detection

Stereo RGB and Deeper LIDAR Based Network for 3D Object Detection

Boosting 3D Object Detection with Semantic-Aware Multi-Branch Framework

A Semantic-Based Loop Closure Detection of 3D Point Cloud

AGO-Net: Association-Guided 3D Point Cloud Object Detection Network

3D Detection for Occluded Vehicles From Point Clouds

3D Object Detection Combining Semantic and Geometric Features from Point Clouds

Dense Sequential Fusion: Point Cloud Enhancement Using Foreground Mask Guidance for Multimodal 3-D Object Detection

MSL3D: 3D object detection from monocular, stereo and point cloud for autonomous driving

Region-proposal Convolutional Network-driven Point Cloud Voxelization and Over-segmentation for 3D Object Detection

SegVoxelNet: Exploring Semantic Context and Depth-aware Features for 3D Vehicle Detection from Point Cloud

Semantics-aware LiDAR-Only Pseudo Point Cloud Generation for 3D Object Detection