Abstract:It has been well recognized that fusing the complementary information from depth-aware LiDAR point clouds and semantic-rich stereo images would benefit 3D object detection. Nevertheless, it is non-trivial to explore the inherently unnatural interaction between sparse 3D points and dense 2D pixels. To ease this difficulty, the recent approaches generally project the 3D points onto the 2D image plane to sample the image data and then aggregate the data at the points. However, these approaches often suffer from the mismatch between the resolution of point clouds and RGB images, leading to sub-optimal performance. Specifically, taking the sparse points as the multi-modal data aggregation locations causes severe information loss for high-resolution images, which in turn undermines the effectiveness of multi-sensor fusion. In this paper, we present VPFNet a new architecture that cleverly aligns and aggregates the point cloud and image data at the virtual points. Particularly, with their density lying between that of the 3D points and 2D pixels, the virtual points can nicely bridge the resolution gap between the two sensors, and thus preserve more information for processing. Moreover, we also investigate the data augmentation techniques that can be applied to both point clouds and RGB images, as the data augmentation has made non-negligible contribution towards 3D object detectors to date. We have conducted extensive experiments on KITTI dataset, and have observed good performance compared to the state-of-the-art methods. Remarkably, our VPFNet achieves 83.21 moderate $AP_{3D}$ and 91.86 moderate $AP_{BEV}$ on the KITTI test set. The network design also takes computation efficiency into consideration we can achieve a FPS of 15 on a single NVIDIA RTX 2080Ti GPU.

Multi-View Frustum Pointnet for Object Detection in Autonomous Driving.

SSF: Sparse Point Cloud Object Detection Based on Self-Adaptive Voxel Encoding and Focal-Sparse Convolution

F-PVNet: Frustum-Level 3-D Object Detection on Point–Voxel Feature Representation for Autonomous Driving

Multi-View 3D Object Detection Network for Autonomous Driving

AVFP-MVX: Multimodal VoxelNet with Attention Mechanism and Voxel Feature Pyramid

A Multi-view 3D Vehicle Detection Method Based On Novel 3D Proposal Generation Method

A Versatile Multi-View Framework for LiDAR-based 3D Object Detection with Guidance from Panoptic Segmentation

Multi-View Adaptive Fusion Network for 3D Object Detection

End-to-End Multi-View Fusion for 3D Object Detection in LiDAR Point Clouds

VPFNet: Improving 3D Object Detection with Virtual Point based LiDAR and Stereo Data Fusion

PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection

Accelerating Point-Voxel Representation of 3-D Object Detection for Automatic Driving

Frustum FusionNet: Amodal 3D Object Detection with Multi-Modal Feature Fusion

Deep multi-scale and multi-modal fusion for 3D object detection

Multimodal Virtual Point 3D Detection

3D Object Detection for Point Cloud in Virtual Driving Environment

AMFF-Net: An Effective 3D Object Detector Based on Attention and Multi-Scale Feature Fusion

MSPV3D: Multi-Scale Point-Voxels 3D Object Detection Net

3D Vehicle Detection Using Multi-Level Fusion From Point Clouds and Images

PMPF: Point-Cloud Multiple-Pixel Fusion-Based 3D Object Detection for Autonomous Driving