P2V-RCNN: Point to Voxel Feature Learning for 3D Object Detection From Point Clouds

Jiale Li,Yu Sun,Shujie Luo,Ziqi Zhu,Hang Dai,Andrey S. Krylov,Yong Ding,Ling Shao

DOI: https://doi.org/10.1109/access.2021.3094562

IF: 3.9

2021-01-01

IEEE Access

Abstract:The most recent 3D object detectors for point clouds rely on the coarse voxel-based representation rather than the accurate point-based representation due to a higher box recall in the voxel-based Region Proposal Network (RPN). However, the detection accuracy is severely restricted by the information loss of pose details in the voxels. Different from considering the point cloud as voxel or point representation only, we propose a point-to-voxel feature learning approach to voxelize the point cloud with both the point-wise semantic and local spatial features, which maintains the voxel-wise features to build the high-recall voxel-based RPN and also provides the accurate point-wise features for refining the detection results. Another difficulty in object detection for point cloud is that the visible part varies a lot against the full view of object because of the perspective issues in data acquisition. To address this, we propose an attentive corner aggregation module to attentively aggregate the features of local point cloud surrounding a 3D proposal from the perspectives of eight corners in the proposal 3D bounding box. The experimental results on the competitive KITTI 3D object detection benchmark show that the proposed method achieves state-of-the-art performance.

computer science, information systems,telecommunications,engineering, electrical & electronic

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to maintain both high recall and high precision when performing 3D object detection in point cloud data. Specifically, when using voxel representation, existing methods can construct a region proposal network (RPN) with high recall, but will lose detailed information about object pose; while when using point representation, although more accurate pose information can be retained, the recall rate of RPN is low. Therefore, the paper proposes a new method, namely Point - to - Voxel Feature Learning (P2V - RCNN), aiming to combine the advantages of the two representations, that is, to maintain the high recall rate of voxel representation and retain the high - precision pose information of point representation. In addition, the paper also solves a difficult problem in object detection in point cloud data, that is, the large difference between the visible part of the object and the complete view due to the perspective problem. To solve this problem, the paper proposes an Attentive Corner Aggregation module (ACA), which learns perspective - invariant features by aggregating local point cloud features from the perspectives of the eight corners of the 3D proposal bounding box, thereby improving the detection accuracy. In summary, the main contributions of this paper are as follows: 1. Proposing a new 3D object detection method P2V - RCNN, which achieves state - of - the - art performance on the highly competitive KITTI 3D object detection benchmark. 2. Introducing a point - to - voxel feature learning method, which can retain accurate object pose information while constructing a high - recall map - view RPN. 3. Designing an attention module for learning perspective - invariant features to improve the accuracy in the detection refinement stage.

P2V-RCNN: Point to Voxel Feature Learning for 3D Object Detection From Point Clouds

SSF: Sparse Point Cloud Object Detection Based on Self-Adaptive Voxel Encoding and Focal-Sparse Convolution

PointSiamRCNN: Target-aware Voxel-based Siamese Tracker for Point Clouds

PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection

PV-RCNN++: Semantical Point-Voxel Feature Interaction for 3D Object Detection

Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection

From Voxel to Point: IoU-guided 3D Object Detection for Point Cloud with Voxel-to-Point Decoder

PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud

DV-Det: Efficient 3D Point Cloud Object Detection with Dynamic Voxelization

PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer

3D Object Detection Combining Semantic and Geometric Features from Point Clouds

HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection

VP-Net: Voxels as Points for 3D Object Detection

PI-RCNN: An Efficient Multi-sensor 3D Object Detector with Point-based Attentive Cont-conv Fusion Module

AVFP-MVX: Multimodal VoxelNet with Attention Mechanism and Voxel Feature Pyramid

OCM3D: Object-Centric Monocular 3D Object Detection

Semantic-aware 3D-voxel CenterNet for point cloud object detection

Voxel or Pillar: Exploring Efficient Point Cloud Representation for 3D Object Detection

PVConvNet: Pixel-Voxel Sparse Convolution for multimodal 3D object detection

From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object Detection

Anti-Noise 3D Object Detection of Multimodal Feature Attention Fusion Based on PV-RCNN