Abstract:Recent progress on 2D object detection has featured Cascade RCNN, which capitalizes on a sequence of cascade detectors to progressively improve proposal quality, towards high-quality object detection. However, there has not been evidence in support of building such cascade structures for 3D object detection, a challenging detection scenario with highly sparse LiDAR point clouds. In this work, we present a simple yet effective cascade architecture, named 3D Cascade RCNN, that allocates multiple detectors based on the voxelized point clouds in a cascade paradigm, pursuing higher quality 3D object detector progressively. Furthermore, we quantitatively define the sparsity level of the points within 3D bounding box of each object as the point completeness score, which is exploited as the task weight for each proposal to guide the learning of each stage detector. The spirit behind is to assign higher weights for high-quality proposals with relatively complete point distribution, while down-weight the proposals with extremely sparse points that often incur noise during training. This design of completeness-aware re-weighting elegantly upgrades the cascade paradigm to be better applicable for the sparse input data, without increasing any FLOP budgets. Through extensive experiments on both the KITTI dataset and Waymo Open Dataset, we validate the superiority of our proposed 3D Cascade RCNN, when comparing to state-of-the-art 3D object detection techniques. The source code is publicly available at \url{<a class="link-external link-https" href="https://github.com/caiqi/Cascasde-3D" rel="external noopener nofollow">this https URL</a>}.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to conduct high - quality 3D object detection in point cloud data. Specifically, the paper focuses on how to improve the quality of 3D object detection in highly sparse LiDAR point clouds. Traditional methods often lead to unsatisfactory detection results when dealing with distant or occluded objects due to the sparsity of point cloud data. For example, the state - of - the - art 3D object detectors (such as PV - RCNN) can successfully detect close - range objects with dense point distributions, but perform poorly when detecting distant objects with sparse point distributions. To meet this challenge, the paper proposes a new architecture named 3D Cascade RCNN. This architecture solves the above problems in the following ways: 1. **Cascade Detection Head**: 3D Cascade RCNN adopts a cascade detection head design, that is, multiple detection heads work in sequence in a cascade manner to gradually improve the quality of detection proposals. Each detection head refines the proposals generated by the previous detection head, thereby gradually improving the detection quality. 2. **Point Completeness Score Module (PCS)**: The paper introduces a new module to quantify the point cloud sparsity in each proposal, called the Point Completeness Score. This score reflects the coverage ratio of the point cloud inside the target object and can be used to evaluate the quality of the point cloud. The specific calculation method is as follows: \[ Q=\frac{A\cap B}{A\cup B} \] where \(A\) is the minimum bounding box of the point cloud inside the target object, and \(B\) is the 3D bounding box of the ground - truth annotation. Since \(A\) is the minimum bounding box of the point cloud inside \(B\), \(A\cup B\) is equal to \(B\), and the formula can be simplified to: \[ Q = \frac{A\cap B}{B} \] 3. **Integrity - Based Re - weighting Strategy**: In order to reduce the negative impact of sparse point clouds on the training process, the paper proposes a re - weighting strategy based on the point completeness score. Specifically, for each proposal, the task weight is readjusted according to its point completeness score. Proposals with high integrity are assigned higher weights, while proposals with low integrity have their weights reduced. This design helps to focus more on high - quality proposals during the training process, thereby improving the overall detection performance. Through extensive experiments on the KITTI dataset and the Waymo Open Dataset, the paper verifies the effectiveness of 3D Cascade RCNN and achieves significant performance improvements compared to existing 3D object detection techniques.

3D Cascade RCNN: High Quality Object Detection in Point Clouds

SSF: Sparse Point Cloud Object Detection Based on Self-Adaptive Voxel Encoding and Focal-Sparse Convolution

CasA: A Cascade Attention Network for 3-D Object Detection from LiDAR Point Clouds

CascadeV-Det: Cascade Point Voting for 3D Object Detection

PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud

Cascade R-CNN: Delving into High Quality Object Detection

From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object Detection

Cascade R-CNN: High Quality Object Detection and Instance Segmentation

P2V-RCNN: Point to Voxel Feature Learning for 3D Object Detection From Point Clouds

Cascaded Cross-Modality Fusion Network for 3D Object Detection

High Quality Object Detection for Multiresolution Remote Sensing Imagery Using Cascaded Multi-Stage Detectors.

Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection

Voxel Graph Attention for 3-D Object Detection From Point Clouds

LiDAR-only 3D Object Detection Based on Spatial Context

Multi-scale Feature Fusion with Point Pyramid for 3D Object Detection

PI-RCNN: An Efficient Multi-sensor 3D Object Detector with Point-based Attentive Cont-conv Fusion Module

PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection

Cascading Classifier with Discriminative Multi-Features for a Specific 3D Object Real-Time Detection.

FasterV-RCNN: Efficient Point Cloud 3D Object Detection Framework

Graph R-CNN: Towards Accurate 3D Object Detection with Semantic-Decorated Local Graph