Position-aware Guided Point Cloud Completion with CLIP Model

Feng Zhou,Qi Zhang,Ju Dai,Lei Li,Qing Fan,Junliang Xing
2024-12-11
Abstract:Point cloud completion aims to recover partial geometric and topological shapes caused by equipment defects or limited viewpoints. Current methods either solely rely on the 3D coordinates of the point cloud to complete it or incorporate additional images with well-calibrated intrinsic parameters to guide the geometric estimation of the missing parts. Although these methods have achieved excellent performance by directly predicting the location of complete points, the extracted features lack fine-grained information regarding the location of the missing area. To address this issue, we propose a rapid and efficient method to expand an unimodal framework into a multimodal framework. This approach incorporates a position-aware module designed to enhance the spatial information of the missing parts through a weighted map learning mechanism. In addition, we establish a Point-Text-Image triplet corpus PCI-TI and MVP-TI based on the existing unimodal point cloud completion dataset and use the pre-trained vision-language model CLIP to provide richer detail information for 3D shapes, thereby enhancing performance. Extensive quantitative and qualitative experiments demonstrate that our method outperforms state-of-the-art point cloud completion methods.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of the lack of some geometric and topological shapes in point cloud data in practical applications due to device defects or view - angle limitations. Specifically, existing methods mainly rely on the 3D coordinates of point clouds to complete point clouds, or combine additional image information to guide the geometric estimation of missing parts. However, the features extracted by these methods lack fine - grained information about the location of the missing area, resulting in the generated complete point clouds being not accurate enough. To solve this problem, the author proposes a fast and efficient method to expand the unimodal framework into a multimodal framework and introduces a Position - aware Module to enhance the spatial information of the missing part. In addition, the author also establishes the Point - Text - Image triple - corpus PCI - TI and MVP - TI, and uses the pre - trained vision - language model CLIP to provide more detailed information, thereby improving the performance of point cloud completion. ### Main contributions 1. **Multimodal framework expansion**: A fast and efficient method for expanding the unimodal point cloud completion framework into a multimodal framework is proposed. 2. **Position - aware module**: A position - aware module is designed to learn the location information of the missing part of the point cloud, making the network more targeted during the completion process. 3. **Multimodal data set**: Paired text descriptions and projection map corpora are introduced for each point cloud, and two extended data sets PCN - TI and MVP - TI are proposed. Extensive experiments show that this method outperforms the existing state - of - the - art methods in performance. ### Key technologies of the solution - **Position - aware module**: The spatial information of the missing part is enhanced through the weighted graph learning mechanism. - **Multimodal fusion**: Point cloud, text and image information are combined to provide more descriptive details. - **CLIP model**: The pre - trained vision - language model CLIP is used to provide more detailed 3D shape information. Through these technologies, the author has successfully improved the accuracy and robustness of point cloud completion, especially performing excellently in the completion tasks of different - category shapes.