AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset

Jiakang Yuan,Bo Zhang,Xiangchao Yan,Tao Chen,Botian Shi,Yikang Li,Yu Qiao
2023-10-26
Abstract:It is a long-term vision for Autonomous Driving (AD) community that the perception models can learn from a large-scale point cloud dataset, to obtain unified representations that can achieve promising results on different tasks or benchmarks. Previous works mainly focus on the self-supervised pre-training pipeline, meaning that they perform the pre-training and fine-tuning on the same benchmark, which is difficult to attain the performance scalability and cross-dataset application for the pre-training checkpoint. In this paper, for the first time, we are committed to building a large-scale pre-training point-cloud dataset with diverse data distribution, and meanwhile learning generalizable representations from such a diverse pre-training dataset. We formulate the point-cloud pre-training task as a semi-supervised problem, which leverages the few-shot labeled and massive unlabeled point-cloud data to generate the unified backbone representations that can be directly applied to many baseline models and benchmarks, decoupling the AD-related pre-training process and downstream fine-tuning task. During the period of backbone pre-training, by enhancing the scene- and instance-level distribution diversity and exploiting the backbone's ability to learn from unknown instances, we achieve significant performance gains on a series of downstream perception benchmarks including Waymo, nuScenes, and KITTI, under different baseline models like PV-RCNN++, SECOND, CenterPoint.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address the issue of pre-training autonomous driving (AD) perception models on large-scale point cloud datasets to obtain a unified representation that performs well across different tasks or benchmarks. Specifically, existing research mainly focuses on self-supervised pre-training pipelines, i.e., pre-training and fine-tuning on the same benchmark dataset, which makes it difficult to achieve performance scalability and cross-dataset application. This paper is the first to focus on constructing a large-scale pre-training point cloud dataset with diverse data distributions and learning general representations from such diverse pre-training datasets. ### Main Contributions 1. **Proposing the AD-PT Paradigm**: This is the first time the AD-PT paradigm is proposed, aiming to learn a unified representation by pre-training a general backbone network and transferring the knowledge to various benchmarks. 2. **Diverse Pre-training Data Preparation**: A diverse pre-training data preparation process and unknown instance learning methods are proposed, which can enhance the representational capability of feature extraction during the backbone network pre-training process. 3. **Unified Approach**: The study shows that once the pre-training checkpoints are generated, they can be directly loaded into multiple perception baselines and benchmarks. Experimental results further validate that this AD-PT paradigm significantly improves accuracy on different benchmarks (e.g., Waymo, nuScenes, and KITTI). ### Method Overview 1. **Large-scale Point Cloud Dataset Preparation**: - **Category-aware Pseudo Label Generator**: Different baseline models are used to annotate different semantic classes, and semi-supervised methods (e.g., MeanTeacher) are employed to further improve accuracy on the ONCE validation set. - **Diversity-based Pre-training Processor**: Scene-level and region-level data diversity is increased through point-to-beam resampling and object rescaling strategies. 2. **Learning Unified Representation**: - **Unknown Instance Learning Head**: A two-branch unknown instance learning head is designed to avoid mistaking potential foreground instances for background parts, and consistency loss is used to ensure the consistency of the computed corresponding foreground regions. ### Experimental Results - The AD-PT paradigm significantly improves the performance of different baseline models on benchmarks such as Waymo, nuScenes, and KITTI. - Compared to existing self-supervised pre-training and semi-supervised learning methods, AD-PT demonstrates better generalization ability and higher accuracy across various datasets. ### Conclusion By constructing a large-scale and diverse point cloud dataset and designing effective pre-training methods, this paper successfully addresses the generalization problem of autonomous driving perception models across different tasks and datasets. This approach provides new insights and technical support for future autonomous driving research.