Abstract:In the current 3-D object detection tasks, most algorithms are based on pure point cloud. Although LiDAR can provide target location information and contour information for detection, it is sparse, especially for long-distance objects. Besides, camera sensors can provide more detailed target color, texture information, and so on. However, if both point cloud and image data are used for object detection at the same time, the problem of large model capacity and overfitting will occur. Different modes will also produce different gradients for different subnetworks, and the entire network will be difficult to optimize. In order to solve these problems and continuously improve the performance of detection algorithms, this article designs a 3-D object detection method attention mechanism and voxel feature pyramid multimodal VoxelNet (AVFP-MVX), which uses both point cloud and image data to solve the above problems. By referring to MVX-Net, attention mechanism and voxel feature pyramid are used to improve the detection accuracy of 3-D objects. The visualization results show that the overall performance of AVFP-MVX does well, which can accurately select the target object and return to a good bounding box. Comparative tests show that the proposed method detected 91.24%, 80.45%, and 76.91% for the easy, mod, and hard targets of car, respectively, while the average accuracy of pedestrian and cyclist is 62.44% and 67.64%, respectively, which is better than the other methods. The results of ablation experiments show that when the attention mechanism and the voxel feature pyramid network (Voxel-FPN) were added, the detection accuracy of car, pedestrian, and cyclist was increased by 1.87%, 1.85%, and 1.88%, respectively.

VP-Net: Voxels as Points for 3D Object Detection

SSF: Sparse Point Cloud Object Detection Based on Self-Adaptive Voxel Encoding and Focal-Sparse Convolution

PointSiamRCNN: Target-aware Voxel-based Siamese Tracker for Point Clouds

P2V-RCNN: Point to Voxel Feature Learning for 3D Object Detection From Point Clouds

AVFP-MVX: Multimodal VoxelNet with Attention Mechanism and Voxel Feature Pyramid

[Influence of beta-receptor stimulation and blocking on the vectorcardiogram].

Semantic-aware 3D-voxel CenterNet for point cloud object detection

Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection

DVFENet: Dual-branch voxel feature extraction network for 3D object detection

SIEV-Net: A Structure-Information Enhanced Voxel Network for 3D Object Detection From LiDAR Point Clouds

MSPV3D: Multi-Scale Point-Voxels 3D Object Detection Net

PV-RCNN++: Semantical Point-Voxel Feature Interaction for 3D Object Detection

SVGA-Net: Sparse Voxel-Graph Attention Network for 3D Object Detection from Point Clouds

PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection

Point Density-Aware Voxels for LiDAR 3D Object Detection

F-PVNet: Frustum-Level 3-D Object Detection on Point–Voxel Feature Representation for Autonomous Driving

VoxNet: A 3D Convolutional Neural Network for real-time object recognition

3D Object Detection on Voxels in Spherical Coordinate System.

VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking

Voxel or Pillar: Exploring Efficient Point Cloud Representation for 3D Object Detection

VPFNet: Voxel-Pixel Fusion Network for Multi-class 3D Object Detection