PointCG: Self-supervised Point Cloud Learning via Joint Completion and Generation

Yun Liu,Peng Li,Xuefeng Yan,Liangliang Nan,Bing Wang,Honghua Chen,Lina Gong,Wei Zhao,Mingqiang Wei
2024-11-09
Abstract:The core of self-supervised point cloud learning lies in setting up appropriate pretext tasks, to construct a pre-training framework that enables the encoder to perceive 3D objects effectively. In this paper, we integrate two prevalent methods, masked point modeling (MPM) and 3D-to-2D generation, as pretext tasks within a pre-training framework. We leverage the spatial awareness and precise supervision offered by these two methods to address their respective limitations: ambiguous supervision signals and insensitivity to geometric information. Specifically, the proposed framework, abbreviated as PointCG, consists of a Hidden Point Completion (HPC) module and an Arbitrary-view Image Generation (AIG) module. We first capture visible points from arbitrary views as inputs by removing hidden points. Then, HPC extracts representations of the inputs with an encoder and completes the entire shape with a decoder, while AIG is used to generate rendered images based on the visible points' representations. Extensive experiments demonstrate the superiority of the proposed method over the baselines in various downstream tasks. Our code will be made available upon acceptance.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address two main issues in self-supervised point cloud learning: 1. **Ambiguous Supervision Signal**: Existing self-supervised methods such as Masked Point Modeling (MPM) struggle to provide clear point-to-point supervision due to the irregularity of point clouds. Common point set similarity measures (e.g., Chamfer Distance and Earth Mover's Distance) fail to offer explicit supervision, resulting in limited feature representation capabilities of the pre-trained backbone network. 2. **Insensitivity to Geometric Information**: Although 3D-to-2D generation tasks provide pixel-level precise supervision by generating 2D images, they may overlook the structural information of occluded point sets due to relying solely on images from limited viewpoints. This weakens the backbone network's ability to perceive the spatial properties of point clouds. To address these issues, the paper proposes a new framework called PointCG, which combines the Hidden Point Completion (HPC) module and the Arbitrary-view Image Generation (AIG) module. In this way, PointCG effectively leverages the advantages of both methods while compensating for their respective shortcomings, thereby excelling in various downstream tasks.