Abstract:In the current 3-D object detection tasks, most algorithms are based on pure point cloud. Although LiDAR can provide target location information and contour information for detection, it is sparse, especially for long-distance objects. Besides, camera sensors can provide more detailed target color, texture information, and so on. However, if both point cloud and image data are used for object detection at the same time, the problem of large model capacity and overfitting will occur. Different modes will also produce different gradients for different subnetworks, and the entire network will be difficult to optimize. In order to solve these problems and continuously improve the performance of detection algorithms, this article designs a 3-D object detection method attention mechanism and voxel feature pyramid multimodal VoxelNet (AVFP-MVX), which uses both point cloud and image data to solve the above problems. By referring to MVX-Net, attention mechanism and voxel feature pyramid are used to improve the detection accuracy of 3-D objects. The visualization results show that the overall performance of AVFP-MVX does well, which can accurately select the target object and return to a good bounding box. Comparative tests show that the proposed method detected 91.24%, 80.45%, and 76.91% for the easy, mod, and hard targets of car, respectively, while the average accuracy of pedestrian and cyclist is 62.44% and 67.64%, respectively, which is better than the other methods. The results of ablation experiments show that when the attention mechanism and the voxel feature pyramid network (Voxel-FPN) were added, the detection accuracy of car, pedestrian, and cyclist was increased by 1.87%, 1.85%, and 1.88%, respectively.

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

MVX-Net: Multimodal VoxelNet for 3D Object Detection

Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection

AVFP-MVX: Multimodal VoxelNet with Attention Mechanism and Voxel Feature Pyramid

VoxNet: A 3D Convolutional Neural Network for real-time object recognition

Multi-view 3D Object Detection Network for Autonomous Driving

VP-Net: Voxels as Points for 3D Object Detection

MonoNext: A 3D Monocular Object Detection with ConvNext

PVConvNet: Pixel-Voxel Sparse Convolution for multimodal 3D object detection

VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking

P2V-RCNN: Point to Voxel Feature Learning for 3D Object Detection From Point Clouds

PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection

DVFENet: Dual-branch voxel feature extraction network for 3D object detection

VoxelNextFusion: A Simple, Unified and Effective Voxel Fusion Framework for Multi-Modal 3D Object Detection

OCM3D: Object-Centric Monocular 3D Object Detection

Voxel-based 3D Detection and Reconstruction of Multiple Objects from a Single Image

End-to-End Multi-View Fusion for 3D Object Detection in LiDAR Point Clouds

Semantic-aware 3D-voxel CenterNet for point cloud object detection

VoxelFormer: Bird's-Eye-View Feature Generation based on Dual-view Attention for Multi-view 3D Object Detection

Murf-Net: Multi-Receptive Field Pillars For 3d Object Detection From Point Cloud