Abstract:The proliferation of depth cameras and LiDAR sensors in actual industrial environments has fueled the pursuit of an effective and efficient 3D point cloud model that enables us to perceive and interact with the physical world. However, the intrinsic complexity of 3D semantic information poses significant challenges to model design, including spatial rotation invariance and irregular point cloud structure, which fundamentally impact the representation and behavior of 3D point cloud systems. Existing have either heavily relied on labeling information in a supervised learning setting or failed to effectively capture the inherent patterns of the 3D point clouds within a self-supervised learning framework, leading to poor performance in specific downstream tasks. To address these limitations, this paper introduces a self-supervised framework, D ual- C ross C ontrastive Neural N etwork (DCCN) for 3D point cloud self-supervised representation learning. DCCN leverages cross-view, cross-network, and domain-specific knowledge distillation to enhance the extraction of hidden features from point clouds and fully exploit the capabilities of the encoder. Our DCCN employs a pseudo-Siamese network consisting of an online network and a target network, facilitating knowledge interaction and distillation. The method extracts internal states from augmented 3D point cloud by learning cross-view relationships and optimizes model parameters through intra-modal cross-network learning. We incorporate a momentum-updating mechanism without shared weights in the Siamese network architecture to distill knowledge and enhance the role differentiation the online and target networks. Experimental results demonstrate that our approach outperforms a range of supervised and self-supervised learning methods across a series of downstream tasks consisting of four tasks in three representative datasets. Ablation studies validate the component-wise effectiveness of cross-view, cross-network, and moment-updating learning objectives in achieving superior point cloud representation. The overall findings establish our method, DCCN, as an effective solution for 3D point cloud representation learning in real-world applications.

Generative Variational-Contrastive Learning for Self-Supervised Point Cloud Representation

Continuous Volumetric Convolution Network with Self-Learning Kernels for Point Clouds

Point Contrastive Prediction with Semantic Clustering for Self-Supervised Learning on Point Cloud Videos

Domain Adaptation on Point Clouds Via Geometry-Aware Implicits

Distillation with Contrast is All You Need for Self-Supervised Point Cloud Representation Learning

Self-Supervised Intra-Modal and Cross-Modal Contrastive Learning for Point Cloud Understanding

DCCN: A dual-cross contrastive neural network for 3D point cloud representation learning

Latent-Space Laplacian Pyramids for Adversarial Representation Learning with 3D Point Clouds

Cross-Modal Self-Supervised Learning with Effective Contrastive Units for LiDAR Point Clouds

PointCG: Self-supervised Point Cloud Learning via Joint Completion and Generation

Multi-Angle Point Cloud-VAE: Unsupervised Feature Learning for 3D Point Clouds from Multiple Angles by Joint Self-Reconstruction and Half-to-Half Prediction

Unsupervised contrastive learning with simple transformation for 3D point cloud data

Progressive Generation of 3D Point Clouds with Hierarchical Consistency

SegContrast: 3D Point Cloud Feature Representation Learning Through Self-Supervised Segment Discrimination

Contrastive Predictive Autoencoders for Dynamic Point Cloud Self-Supervised Learning

PointMoment:Mixed-Moment-based Self-Supervised Representation Learning for 3D Point Clouds

GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding

Self-Contrastive Learning with Hard Negative Sampling for Self-supervised Point Cloud Learning

GS-PT: Exploiting 3D Gaussian Splatting for Comprehensive Point Cloud Understanding via Self-supervised Learning

Joint data and feature augmentation for self-supervised representation learning on point clouds

Self-Supervised Point Cloud Representation Learning with Occlusion Auto-Encoder