3D Cascade RCNN: High Quality Object Detection in Point Clouds

Qi Cai,Yingwei Pan,Ting Yao,Tao Mei
DOI: https://doi.org/10.48550/arXiv.2211.08248
2022-11-15
Abstract:Recent progress on 2D object detection has featured Cascade RCNN, which capitalizes on a sequence of cascade detectors to progressively improve proposal quality, towards high-quality object detection. However, there has not been evidence in support of building such cascade structures for 3D object detection, a challenging detection scenario with highly sparse LiDAR point clouds. In this work, we present a simple yet effective cascade architecture, named 3D Cascade RCNN, that allocates multiple detectors based on the voxelized point clouds in a cascade paradigm, pursuing higher quality 3D object detector progressively. Furthermore, we quantitatively define the sparsity level of the points within 3D bounding box of each object as the point completeness score, which is exploited as the task weight for each proposal to guide the learning of each stage detector. The spirit behind is to assign higher weights for high-quality proposals with relatively complete point distribution, while down-weight the proposals with extremely sparse points that often incur noise during training. This design of completeness-aware re-weighting elegantly upgrades the cascade paradigm to be better applicable for the sparse input data, without increasing any FLOP budgets. Through extensive experiments on both the KITTI dataset and Waymo Open Dataset, we validate the superiority of our proposed 3D Cascade RCNN, when comparing to state-of-the-art 3D object detection techniques. The source code is publicly available at \url{<a class="link-external link-https" href="https://github.com/caiqi/Cascasde-3D" rel="external noopener nofollow">this https URL</a>}.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to conduct high - quality 3D object detection in point cloud data. Specifically, the paper focuses on how to improve the quality of 3D object detection in highly sparse LiDAR point clouds. Traditional methods often lead to unsatisfactory detection results when dealing with distant or occluded objects due to the sparsity of point cloud data. For example, the state - of - the - art 3D object detectors (such as PV - RCNN) can successfully detect close - range objects with dense point distributions, but perform poorly when detecting distant objects with sparse point distributions. To meet this challenge, the paper proposes a new architecture named 3D Cascade RCNN. This architecture solves the above problems in the following ways: 1. **Cascade Detection Head**: 3D Cascade RCNN adopts a cascade detection head design, that is, multiple detection heads work in sequence in a cascade manner to gradually improve the quality of detection proposals. Each detection head refines the proposals generated by the previous detection head, thereby gradually improving the detection quality. 2. **Point Completeness Score Module (PCS)**: The paper introduces a new module to quantify the point cloud sparsity in each proposal, called the Point Completeness Score. This score reflects the coverage ratio of the point cloud inside the target object and can be used to evaluate the quality of the point cloud. The specific calculation method is as follows: \[ Q=\frac{A\cap B}{A\cup B} \] where \(A\) is the minimum bounding box of the point cloud inside the target object, and \(B\) is the 3D bounding box of the ground - truth annotation. Since \(A\) is the minimum bounding box of the point cloud inside \(B\), \(A\cup B\) is equal to \(B\), and the formula can be simplified to: \[ Q = \frac{A\cap B}{B} \] 3. **Integrity - Based Re - weighting Strategy**: In order to reduce the negative impact of sparse point clouds on the training process, the paper proposes a re - weighting strategy based on the point completeness score. Specifically, for each proposal, the task weight is readjusted according to its point completeness score. Proposals with high integrity are assigned higher weights, while proposals with low integrity have their weights reduced. This design helps to focus more on high - quality proposals during the training process, thereby improving the overall detection performance. Through extensive experiments on the KITTI dataset and the Waymo Open Dataset, the paper verifies the effectiveness of 3D Cascade RCNN and achieves significant performance improvements compared to existing 3D object detection techniques.