Abstract:The proliferation of depth cameras and LiDAR sensors in actual industrial environments has fueled the pursuit of an effective and efficient 3D point cloud model that enables us to perceive and interact with the physical world. However, the intrinsic complexity of 3D semantic information poses significant challenges to model design, including spatial rotation invariance and irregular point cloud structure, which fundamentally impact the representation and behavior of 3D point cloud systems. Existing have either heavily relied on labeling information in a supervised learning setting or failed to effectively capture the inherent patterns of the 3D point clouds within a self-supervised learning framework, leading to poor performance in specific downstream tasks. To address these limitations, this paper introduces a self-supervised framework, D ual- C ross C ontrastive Neural N etwork (DCCN) for 3D point cloud self-supervised representation learning. DCCN leverages cross-view, cross-network, and domain-specific knowledge distillation to enhance the extraction of hidden features from point clouds and fully exploit the capabilities of the encoder. Our DCCN employs a pseudo-Siamese network consisting of an online network and a target network, facilitating knowledge interaction and distillation. The method extracts internal states from augmented 3D point cloud by learning cross-view relationships and optimizes model parameters through intra-modal cross-network learning. We incorporate a momentum-updating mechanism without shared weights in the Siamese network architecture to distill knowledge and enhance the role differentiation the online and target networks. Experimental results demonstrate that our approach outperforms a range of supervised and self-supervised learning methods across a series of downstream tasks consisting of four tasks in three representative datasets. Ablation studies validate the component-wise effectiveness of cross-view, cross-network, and moment-updating learning objectives in achieving superior point cloud representation. The overall findings establish our method, DCCN, as an effective solution for 3D point cloud representation learning in real-world applications.

PointCMC: cross-modal multi-scale correspondences learning for point cloud understanding

Self-Supervised Intra-Modal and Cross-Modal Contrastive Learning for Point Cloud Understanding

CFI2P: Coarse-to-Fine Cross-Modal Correspondence Learning for Image-to-Point Cloud Registration

PointMS: Semantic Segmentation for Point Cloud Based on Multi-scale Directional Convolution

MM-Point: Multi-View Information-Enhanced Multi-Modal Self-Supervised 3D Point Cloud Understanding

CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding

DCCN: A dual-cross contrastive neural network for 3D point cloud representation learning

Cross-Modal Information-Guided Network using Contrastive Learning for Point Cloud Registration

Multi-scale Matching Networks for Semantic Correspondence

PointMTL: Multi-Transform Learning for Effective 3D Point Cloud Representations

PointMCD: Boosting Deep Point Cloud Encoders via Multi-view Cross-modal Distillation for 3D Shape Recognition

PointCLM: A Contrastive Learning-based Framework for Multi-instance Point Cloud Registration

Learning and Matching Multi-View Descriptors for Registration of Point Clouds

PointMM: Point Cloud Semantic Segmentation CNN under Multi-Spatial Feature Encoding and Multi-Head Attention Pooling

Boosting 3D Point Cloud Registration by Transferring Multi-modality Knowledge

Multi-scale Network with Attentional Multi-resolution Fusion for Point Cloud Semantic Segmentation

Deep Multi-scale Learning on Point Sets for 3D Object Recognition.

End-to-End Learning Local Multi-View Descriptors for 3D Point Clouds

Differentiable Registration of Images and LiDAR Point Clouds with VoxelPoint-to-Pixel Matching

C2BG-Net: Cross-modality and cross-scale balance network with global semantics for multi-modal 3D object detection