GS-PT: Exploiting 3D Gaussian Splatting for Comprehensive Point Cloud Understanding via Self-supervised Learning

Keyi Liu,Yeqi Luo,Weidong Yang,Jingyi Xu,Zhijun Li,Wen-Ming Chen,Ben Fei
DOI: https://doi.org/10.48550/arXiv.2409.04963
2024-09-08
Abstract:Self-supervised learning of point cloud aims to leverage unlabeled 3D data to learn meaningful representations without reliance on manual annotations. However, current approaches face challenges such as limited data diversity and inadequate augmentation for effective feature learning. To address these challenges, we propose GS-PT, which integrates 3D Gaussian Splatting (3DGS) into point cloud self-supervised learning for the first time. Our pipeline utilizes transformers as the backbone for self-supervised pre-training and introduces novel contrastive learning tasks through 3DGS. Specifically, the transformers aim to reconstruct the masked point cloud. 3DGS utilizes multi-view rendered images as input to generate enhanced point cloud distributions and novel view images, facilitating data augmentation and cross-modal contrastive learning. Additionally, we incorporate features from depth maps. By optimizing these tasks collectively, our method enriches the tri-modal self-supervised learning process, enabling the model to leverage the correlation across 3D point clouds and 2D images from various modalities. We freeze the encoder after pre-training and test the model's performance on multiple downstream tasks. Experimental results indicate that GS-PT outperforms the off-the-shelf self-supervised learning methods on various downstream tasks including 3D object classification, real-world classifications, and few-shot learning and segmentation.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem this paper attempts to address is: how to effectively learn meaningful representations from large-scale unlabeled 3D point cloud data using self-supervised learning methods, without relying on manual annotations. Specifically, current self-supervised learning methods face two main challenges when dealing with 3D point clouds: 1. **Data diversity and the scarcity of high-quality multimodal data pairs**: Effective self-supervised learning requires integrating information from various sources (such as point clouds, rendered RGB images, and depth maps), but these high-quality data pairs are very scarce in practical applications. 2. **Simple geometric transformations lead to single-feature representations**: Existing self-supervised learning methods typically rely on simple geometric transformations to augment data, which results in overly simplistic feature representations and affects the model's generalization ability. To address these challenges, the paper proposes GS-PT (Gaussian Splatting for Point Cloud Self-Supervised Learning), which for the first time applies 3D Gaussian Splatting (3DGS) technology to self-supervised learning of point clouds. By introducing 3DGS, GS-PT can generate enhanced point cloud distributions and new viewpoint images, thereby achieving richer data augmentation and cross-modal contrastive learning. Specifically, GS-PT uses multi-view rendered images as input to generate enhanced point cloud distributions and new viewpoint images, and combines depth map features to optimize multiple tasks, enriching the tri-modal self-supervised learning process and enabling the model to better utilize the associations between 3D point clouds and 2D images. Experimental results show that GS-PT outperforms existing self-supervised learning methods on multiple downstream tasks (such as 3D object classification, real-world classification, few-shot learning, and segmentation).