Abstract:Three-dimensional object detection is a pivotal research topic in computer vision, aiming to identify and locate objects in three-dimensional space. It has wide applications in various fields such as geoscience, autonomous driving, and drone navigation. The rapid development of deep learning techniques has led to significant advancements in 3D object detection. However, with the increasing complexity of applications, 3D object detection faces a series of challenges such as data imbalance and the effectiveness of network models. Specifically, in an experiment, our investigation revealed a notable discrepancy in the LiDAR reflection intensity within a point cloud scene, with stronger intensities observed in proximity and weaker intensities observed at a distance. Furthermore, we have also noted a substantial disparity in the number of foreground points compared to the number of background points. Especially in 3D object detection, the foreground point is more important than the background point, but it is usually downsampled without discrimination in the subsequent processing. With the objective of tackling these challenges, we work from both data and network perspectives, designing a feature alignment filtering algorithm and a two-stage 3D object detection network. Firstly, in order to achieve feature alignment, we introduce a correction equation to decouple the relationship between distance and intensity and eliminate the attenuation effect of intensity caused by distance. Then, a background point filtering algorithm is designed by using the aligned data to alleviate the problem of data imbalance. At the same time, we take into consideration the fact that the accuracy of semantic segmentation plays a crucial role in 3D object detection. Therefore, we propose a two-stage deep learning network that integrates spatial and spectral information, in which a feature fusion branch is designed and embedded in the semantic segmentation backbone. Through a series of experiments on the KITTI dataset, it is proven that the proposed method achieves the following average precision (AP_R40) values for easy, moderate, and hard difficulties, respectively: car (Iou 0.7)—89.23%, 80.14%, and 77.89%; pedestrian (Iou 0.5)—52.32%, 45.47%, and 38.78%; and cyclist (Iou 0.5)—76.41%, 61.92%, and 56.39%. By emphasizing both data quality optimization and efficient network architecture, the performance of the proposed method is made comparable to other state-of-the-art methods.

An End-to-End Deep Learning Network for 3D Object Detection From RGB-D Data Based on Hough Voting

3D-SSD: Learning Hierarchical Features from RGB-D Images for Amodal 3D Object Detection

Stereo RGB and Deeper LIDAR Based Network for 3D Object Detection

A Multi-view 3D Vehicle Detection Method Based On Novel 3D Proposal Generation Method

3D Street Object Detection from Monocular Images Using Deep Learning and Depth Information

Neighbor-Vote: Improving Monocular 3D Object Detection through Neighbor Distance Voting

From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object Detection

Deep Hough Voting for 3D Object Detection in Point Clouds

M3D-RPN: Monocular 3D Region Proposal Network for Object Detection

6DoF-3D: Efficient and accurate 3D object detection using six degrees-of-freedom for autonomous driving

Accurate Monocular Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving

Equal Emphasis on Data and Network: A Two-Stage 3D Point Cloud Object Detection Algorithm with Feature Alignment

Stereo R-CNN based 3D Object Detection for Autonomous Driving

Multi-view 3D Object Detection Network for Autonomous Driving

An Efficient 3D Object Detection Method Based on Fast Guided Anchor Stereo RCNN

Research on 3D Point Cloud Object Detection Algorithm for Autonomous Driving

Ground-aware Monocular 3D Object Detection for Autonomous Driving

Deep learning based 3D target detection for indoor scenes

HEDNet: A Hierarchical Encoder-Decoder Network for 3D Object Detection in Point Clouds

FP-RCNN: A Real-Time 3D Target Detection Model based on Multiple Foreground Point Sampling for Autonomous Driving