Abstract:3D object detection is fundamentally important for various emerging applications, including autonomous driving and robotics. A key requirement for training an accurate 3D object detector is the availability of a large amount of LiDAR-based point cloud data. Unfortunately, labeling point cloud data is extremely challenging, as accurate 3D bounding boxes and semantic labels are required for each potential object. This paper proposes a unified active 3D object detection framework, for greatly reducing the labeling cost of training 3D object detectors. Our framework is based on a novel formulation of submodular optimization, specifically tailored to the problem of active 3D object detection. In particular, we address two fundamental challenges associated with active 3D object detection: data imbalance and the need to cover the distribution of the data, including LiDAR-based point cloud data of varying difficulty levels. Extensive experiments demonstrate that our method achieves state-of-the-art performance with high computational efficiency compared to existing active learning methods. The code is available at <a class="link-external link-https" href="https://github.com/RuiyuM/STONE" rel="external noopener nofollow">this https URL</a>.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the high - cost annotation in 3D object detection. Specifically, training an accurate 3D object detector requires a large amount of LiDAR point - cloud data, and the annotation of these data is very difficult because each potential object requires an accurate 3D bounding box and semantic label. To solve this problem, the paper proposes a unified active learning framework, aiming to reduce the annotation cost of 3D object detection by reducing the annotation requirements.
### Main Problems and Challenges
1. **Data Imbalance**: Each 3D scene may contain multiple object categories (such as cars, bicycles, etc.), which leads to a highly unbalanced label distribution in the point - cloud data. For example, most point - cloud data contains cars, but there are few bicycles or pedestrians. Therefore, when selecting point - cloud data for training, it is crucial to ensure the balance of the label distribution.
2. **Coverage of Different Difficulty Levels**: Different scenes and objects have different difficulty levels (such as easy, medium, difficult), which are determined by the size, occlusion degree and truncation of the objects. Ideally, the selected annotated point - cloud data should cover samples of different difficulty levels to ensure the generalization ability of the model in various scenarios.
### Proposed Method
To address the above challenges, the paper proposes a sub - modular optimization framework named STONE. Based on the sub - modular optimization theory, this framework selects unannotated point - cloud data through the following two main objectives:
- **Representativeness**: The selected point - cloud data should be able to represent different difficulty levels of the entire unannotated data set.
- **Maintenance of Label Distribution**: The selected point - cloud data should be able to maintain the balance of the label distribution and avoid the data imbalance problem.
Specifically, the STONE framework is implemented through the following two stages:
1. **Gradient - Based Sub - modular Subset Selection (GBSSS)**:
- Use a gradient - based sub - modular function \( f_1 \) to measure the representativeness of the selected unannotated point - cloud data.
- Select samples with diversity and coverage by maximizing \( f_1(D_S)-f_1(D_U) \), that is, minimizing the absolute difference from the entire unannotated data set.
2. **Sub - modular Optimization for Class - Balance (SDMCB)**:
- Use another sub - modular function \( f_2 \) to ensure that the unannotated point - cloud data added to the annotated data set \( D_L \) does not reduce the overall quality (i.e., the quality of the label distribution).
- Select samples that can balance the label distribution by maximizing \( f_2(D_L)-f_2(D_L\cup D_S) \).
### Summary
The STONE framework proposed in the paper effectively solves the key challenges of active learning in 3D object detection, including data imbalance and coverage of different difficulty levels, through the sub - modular optimization method. The experimental results show that this method has achieved state - of - the - art performance on real - world autonomous driving data sets such as KITTI and Waymo Open, and has high computational efficiency.