Three-Dimensional Point Cloud Object Detection Based on Feature Fusion and Enhancement

Yangyang Li,Zejun Ou,Guangyuan Liu,Zichen Yang,Yanqiao Chen,Ronghua Shang,Licheng Jiao

DOI: https://doi.org/10.3390/rs16061045

IF: 5

2024-03-16

Remote Sensing

Abstract:With the continuous emergence and development of 3D sensors in recent years, it has become increasingly convenient to collect point cloud data for 3D object detection tasks, such as the field of autonomous driving. But when using these existing methods, there are two problems that cannot be ignored: (1) The bird's eye view (BEV) is a widely used method in 3D objective detection; however, the BEV usually compresses dimensions by combined height, dimension, and channels, which makes the process of feature extraction in feature fusion more difficult. (2) Light detection and ranging (LiDAR) has a much larger effective scanning depth, which causes the sector to become sparse in deep space and the uneven distribution of point cloud data. This results in few features in the distribution of neighboring points around the key points of interest. The following is the solution proposed in this paper: (1) This paper proposes multi-scale feature fusion composed of feature maps at different levels made of Deep Layer Aggregation (DLA) and a feature fusion module for the BEV. (2) A point completion network is used to improve the prediction results by completing the feature points inside the candidate boxes in the second stage, thereby strengthening their position features. Supervised contrastive learning is applied to enhance the segmentation results, improving the discrimination capability between the foreground and background. Experiments show these new additions can achieve improvements of 2.7%, 2.4%, and 2.5%, respectively, on KITTI easy, moderate, and hard tasks. Further ablation experiments show that each addition has promising improvement over the baseline.

environmental sciences,imaging science & photographic technology,remote sensing,geosciences, multidisciplinary

What problem does this paper attempt to address?

This paper attempts to address two key challenges in 3D point cloud object detection: 1. **Difficulties in Feature Fusion with Bird's Eye View (BEV) Methods**: - BEV methods typically compress dimensions by merging height, width, and channels, which complicates the feature extraction process. Specifically, this compression makes it difficult to extract spatial and semantic features during feature fusion. 2. **Sparse Distribution of LiDAR Data in Distant Areas**: - The effective scanning depth of LiDAR is relatively large, leading to sparse point cloud data in distant areas, which affects the distribution of neighboring point features around key points. This sparsity is similar to the issue of insignificant features of small objects (such as pedestrians) in the point cloud. To address these challenges, the paper proposes the following solutions: 1. **Multi-Scale Feature Fusion**: - A multi-scale feature fusion method based on Deep Layer Aggregation (DLA) is proposed, combined with a feature fusion module to handle BEV data. 2. **Point Completion Network (PCN)**: - In the second stage, a Point Completion Network is used to supplement the feature points within the candidate boxes, thereby enhancing their positional features. 3. **Supervised Contrastive Learning (SCL)**: - Supervised contrastive learning is applied to enhance segmentation results and improve the distinction between foreground and background. Experimental results show that these new methods improve performance on the KITTI dataset by 2.7%, 2.4%, and 2.5% for easy, moderate, and hard tasks, respectively. Further ablation experiments indicate that each method significantly improves the baseline model.

Three-Dimensional Point Cloud Object Detection Based on Feature Fusion and Enhancement

Dense Sequential Fusion: Point Cloud Enhancement Using Foreground Mask Guidance for Multimodal 3-D Object Detection

3D Vehicle Detection Using Multi-Level Fusion From Point Clouds and Images

Cascade fusion of multi-modal and multi-source feature fusion by the attention for three-dimensional object detection

Research on 3D Point Cloud Object Detection Algorithm for Autonomous Driving

Deep multi-scale and multi-modal fusion for 3D object detection

Spatial Information Enhancement with Multi-Scale Feature Aggregation for Long-Range Object and Small Reflective Area Object Detection from Point Cloud

Equal Emphasis on Data and Network: A Two-Stage 3D Point Cloud Object Detection Algorithm with Feature Alignment

FFPA-Net: Efficient Feature Fusion with Projection Awareness for 3D Object Detection

MS23D: A 3D object detection method using multi-scale semantic feature points to construct 3D feature layer

PCDR-DFF: multi-modal 3D object detection based on point cloud diversity representation and dual feature fusion

PI-RCNN: An Efficient Multi-sensor 3D Object Detector with Point-based Attentive Cont-conv Fusion Module

3D Object Detection Based on Attention and Multi-Scale Feature Fusion

Multi-Scale Feature Fusion Point Cloud Object Detection Based on Original Point Cloud and Projection

Multi-scale Feature Fusion with Point Pyramid for 3D Object Detection

Multi-Sem Fusion: Multimodal Semantic Fusion for 3D Object Detection

Multi-Modal Fusion Based on Depth Adaptive Mechanism for 3D Object Detection

Enhancing 3D object detection through multi-modal fusion for cooperative perception

3D object detection based on fusion of image and point cloud in autonomous driving traffic scenarios

Sparse Fuse Dense: Towards High Quality 3D Detection with Depth Completion

PPF-Det: Point-Pixel Fusion for Multi-Modal 3D Object Detection