Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion

Shengyuan Zhang,An Zhao,Ling Yang,Zejian Li,Chenye Meng,Haoran Xu,Tianrun Chen,AnYang Wei,Perry Pengyun GU,Lingyun Sun
2024-12-05
Abstract:Diffusion models have been applied to 3D LiDAR scene completion due to their strong training stability and high completion quality. However, the slow sampling speed limits the practical application of diffusion-based scene completion models since autonomous vehicles require an efficient perception of surrounding environments. This paper proposes a novel distillation method tailored for 3D LiDAR scene completion models, dubbed $\textbf{ScoreLiDAR}$, which achieves efficient yet high-quality scene completion. ScoreLiDAR enables the distilled model to sample in significantly fewer steps after distillation. To improve completion quality, we also introduce a novel $\textbf{Structural Loss}$, which encourages the distilled model to capture the geometric structure of the 3D LiDAR scene. The loss contains a scene-wise term constraining the holistic structure and a point-wise term constraining the key landmark points and their relative configuration. Extensive experiments demonstrate that ScoreLiDAR significantly accelerates the completion time from 30.55 to 5.37 seconds per frame ($>$5$\times$) on SemanticKITTI and achieves superior performance compared to state-of-the-art 3D LiDAR scene completion models. Our code is publicly available at <a class="link-external link-https" href="https://github.com/happyw1nd/ScoreLiDAR" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to accelerate 3D LiDAR scene completion based on the diffusion model while maintaining high - quality generation results. Specifically, the authors propose a new method named ScoreLiDAR, which aims to reduce the sampling steps through the model distillation technique, thereby significantly improving the speed of scene completion, and ensuring the quality of the generation results by introducing Structural Loss. ### Background and Problem Description In applications such as autonomous driving, 3D LiDAR sensors can provide high - precision environmental perception, but the point cloud data they collect is usually sparse, especially in occluded areas. In order to provide a denser and more comprehensive scene representation, it is necessary to complete these sparse 3D LiDAR scenes. Although existing diffusion models perform well in training stability and generation quality, their slow sampling speed limits their efficiency in practical applications. Therefore, how to accelerate the sampling process of the diffusion model while ensuring the generation quality has become an urgent problem to be solved. ### Main Contributions of the Paper 1. **Propose ScoreLiDAR**: A new distillation method specifically for the 3D LiDAR scene completion task, which can achieve efficient and high - quality scene completion while significantly reducing the sampling steps. 2. **Introduce Structural Loss**: Capture the geometric structure information of 3D point clouds through scene - level loss and point - level loss to ensure that the student model can effectively learn complex geometric features. 3. **Experimental Verification**: Extensive experiments show that ScoreLiDAR not only significantly improves the sampling speed (more than 5 times), but also outperforms the existing state - of - the - art models in multiple metrics. ### Key Technologies of the Solution - **Variational Score Distillation (VSD)**: Use the pre - trained diffusion model to calculate the distribution matching loss to train the student model. - **Structural Loss**: Including scene - level loss and point - level loss, used to constrain the overall structure and key landmark points and their relative configurations. - **Optimization Process**: Alternately optimize the student model and the auxiliary diffusion model to ensure that the student model can effectively learn from the teacher model. ### Experimental Results The experimental results show that ScoreLiDAR has achieved excellent performance on both the SemanticKITTI and KITTI - 360 datasets. Compared with the state - of - the - art LiDiff model, ScoreLiDAR not only shortens the completion time from 30 seconds to about 5 seconds, but also has a significant improvement in evaluation metrics such as Chamfer Distance (CD) and Jensen - Shannon Divergence (JSD). In addition, the ablation experiment further verifies the effectiveness of the Structural Loss, proving its importance in improving the generation quality. In conclusion, this paper successfully solves the problem of slow sampling speed in the 3D LiDAR scene completion task by proposing the ScoreLiDAR method, providing a faster and more efficient solution for application scenarios such as autonomous driving.