PRED: Pre-training via Semantic Rendering on LiDAR Point Clouds

Hao Yang,Haiyang Wang,Di Dai,Liwei Wang
2023-11-08
Abstract:Pre-training is crucial in 3D-related fields such as autonomous driving where point cloud annotation is costly and challenging. Many recent studies on point cloud pre-training, however, have overlooked the issue of incompleteness, where only a fraction of the points are captured by LiDAR, leading to ambiguity during the training phase. On the other hand, images offer more comprehensive information and richer semantics that can bolster point cloud encoders in addressing the incompleteness issue inherent in point clouds. Yet, incorporating images into point cloud pre-training presents its own challenges due to occlusions, potentially causing misalignments between points and pixels. In this work, we propose PRED, a novel image-assisted pre-training framework for outdoor point clouds in an occlusion-aware manner. The main ingredient of our framework is a Birds-Eye-View (BEV) feature map conditioned semantic rendering, leveraging the semantics of images for supervision through neural rendering. We further enhance our model's performance by incorporating point-wise masking with a high mask ratio (95%). Extensive experiments demonstrate PRED's superiority over prior point cloud pre-training methods, providing significant improvements on various large-scale datasets for 3D perception tasks. Codes will be available at <a class="link-external link-https" href="https://github.com/PRED4pc/PRED" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: **How to improve the pre - training of LiDAR point clouds by combining image information to deal with the inherent incompleteness of point cloud data**. Specifically, the paper proposes a new framework named PRED (Pre - training via semantic rendering), aiming to improve the performance of point cloud encoders in outdoor scenarios such as autonomous driving through semantic rendering and a high - ratio point - level occlusion - aware mask strategy. ### Main problem decomposition: 1. **Incompleteness of point cloud data**: - In outdoor LiDAR datasets, more than 30% of the labeled objects contain fewer than five points, which leads to the ambiguity of point cloud reconstruction and thus affects the quality of the training process. - For example, in the nuScenes dataset, the point cloud data of many objects is incomplete, which makes it difficult for the model to accurately learn the features of these objects. 2. **Challenges of aligning images and point clouds**: - Images provide more comprehensive information and rich semantics compared to point clouds, but directly aligning point clouds with images has an occlusion problem, which may lead to misalignment between points and pixels. - These misalignments will further affect the effect of pre - training because there may be deviations in the alignment between LiDAR and cameras. ### Solutions: 1. **Semantic Rendering**: - The paper introduces a semantic rendering method based on the Bird - Eye - View (BEV) feature map, using the semantic information of images for supervision. - Through neural rendering technology, semantic predictions are generated from the BEV feature map and optimized in combination with depth loss, thus effectively dealing with the occlusion problem. 2. **Point - wise Masking with High Mask Ratio**: - A 95% high - ratio point - level mask strategy is introduced. Compared with the previous 75% patch - level mask method, this method can better preserve the semantic information of the scene. - For smaller objects (such as pedestrians), point - level masks can avoid completely deleting these objects, thus preserving their semantic information. ### Experimental results: - Through experiments on multiple large - scale outdoor LiDAR datasets (such as nuScenes and ONCE), the effectiveness of the PRED framework has been verified. - The results show that PRED significantly outperforms existing point cloud pre - training methods in 3D object detection and BEV map segmentation tasks. ### Summary: By proposing the PRED framework, this paper successfully solves the challenges of point cloud data incompleteness and the alignment of images and point clouds, providing a new and effective solution for outdoor point cloud processing.