Abstract:In the field of autonomous driving and robotics, 3D object detection is a difficult, but important task. To improve the accuracy of detection, LiDAR, which collects the 3D point cloud of a scene, is updated constantly. But the density of the collected 3D points is low, and its distribution is unbalanced in the scene, which influences the accuracy of 3D object detectors in regards to object location and identification. Although corresponding high-resolution scene images from cameras can be used as supplemental information, poor fusion strategies can result in decreased accuracy compared with that of LiDAR-point-only detectors. Thus, to improve the detection performance for the classification, localization, and even boundary location of 3D objects, a two-stage detector with density-and-sparsity feature aggregation, called DASANet, is proposed in this paper. In the first stage, dense pseudo point clouds are generated with images from cameras and are used to obtain the initial proposals. In the second stage, two novel feature aggregation modules are designed to fuse LiDAR point information and pseudo point information, which refines the semantic and detailed representation of the feature maps. To supplement the semantic information of the highest-scale LiDAR features for object localization and classification, a triple differential information supplement (TDIS) module is presented to extract the LiDAR-pseudo differential features and enhance them in spatial, channel, and global dimensions. To increase the detailed information of the LiDAR features for object boundary location, a Siamese three-dimension coordinate attention (STCA) module is presented to extract stable LiDAR and pseudo point cloud features with a Siamese encoder and fuse these features using a three-dimension coordinate attention. Experiments using the KITTI Vision Benchmark Suite demonstrate the improved performance of our DASANet in regards to the localization and boundary location of objects. The ablation studies demonstrate the effectiveness of the TDIS and the STCA modules.

SIANet: 3D object detection with structural information augment network

SIENet: Spatial Information Enhancement Network for 3D Object Detection from Point Cloud

Improving 3D Object Detection with Context-Aware and Dimensional Interaction Attention

SEFormer: Structure Embedding Transformer for 3D Object Detection

SSF: Sparse Point Cloud Object Detection Based on Self-Adaptive Voxel Encoding and Focal-Sparse Convolution

SIEV-Net: A Structure-Information Enhanced Voxel Network for 3D Object Detection From LiDAR Point Clouds

SASAN: Shape-Adaptive Set Abstraction Network for Point-Voxel 3D Object Detection.

AGO-Net: Association-Guided 3D Point Cloud Object Detection Network

Improving 3D Object Detection with Channel-wise Transformer

Semantic-Context Graph Network for Point-based 3D Object Detection

Scanet: Spatial-Channel Attention Network For 3d Object Detection

MSIT-Det: Multi-Scale Feature Aggregation with Iterative Transformer Networks for 3D Object Detection

Structure Aware Single-Stage 3D Object Detection From Point Cloud

Spatial and Semantic Information Enhancement for Indoor 3D Object Detection

3D IoU-Net: IoU Guided 3D Object Detector for Point Clouds

SGCCNet: Single-Stage 3D Object Detector With Saliency-Guided Data Augmentation and Confidence Correction Mechanism

S-AT GCN: Spatial-Attention Graph Convolution Network based Feature Enhancement for 3D Object Detection

CenterPoint-SE: A Single-Stage Anchor-Free 3-D Object Detection Algorithm With Spatial Awareness Enhancement

DASANet: A 3D Object Detector with Density-and-Sparsity Feature Aggregation

AWANet: Attentive-Aware Wide-Kernels Asymmetrical Network with Blended Contour Information for Salient Object Detection

VSL-Net: Voxel structure learning for 3D object detection