Abstract:Recently Transformer-based models have advanced point cloud understanding by leveraging self-attention mechanisms, however, these methods often overlook latent information in less prominent regions, leading to increased sensitivity to perturbations and limited global comprehension. To solve this issue, we introduce PointACL, an attention-driven contrastive learning framework designed to address these limitations. Our method employs an attention-driven dynamic masking strategy that guides the model to focus on under-attended regions, enhancing the understanding of global structures within the point cloud. Then we combine the original pre-training loss with a contrastive learning loss, improving feature discrimination and generalization. Extensive experiments validate the effectiveness of PointACL, as it achieves state-of-the-art performance across a variety of 3D understanding tasks, including object classification, part segmentation, and few-shot learning. Specifically, when integrated with different Transformer backbones like Point-MAE and PointGPT, PointACL demonstrates improved performance on datasets such as ScanObjectNN, ModelNet40, and ShapeNetPart. This highlights its superior capability in capturing both global and local features, as well as its enhanced robustness against perturbations and incomplete data.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve two main problems existing in the current Transformer - based point cloud understanding models: 1. **Increased sensitivity to perturbations**: Existing Transformer models usually rely on a small number of high - attention regions to analyze point clouds, which makes them more sensitive to noise and incomplete data. Specifically, when these high - attention regions are interfered with, the model performance will decline significantly. 2. **Limited global understanding**: Due to ignoring the potential information in low - attention regions, these models have limitations in understanding and capturing the global structure of point clouds. Point cloud data itself is sparse and has no redundant information, so ignoring certain regions may lead to the omission of important information. To solve these problems, the author introduced a new framework named **PointACL** (Attention - driven Contrastive Learning for Point Clouds). PointACL improves the existing models in the following ways: - **Attention - driven dynamic masking strategy**: By dynamically adjusting the masking probability, the model is guided to focus on those neglected low - attention regions, thereby enhancing the understanding of the global structure. - **Combination of contrastive learning loss and pre - training loss**: Combining the traditional pre - training loss (such as reconstruction loss or generation loss) with the contrastive learning loss improves the feature discrimination ability and generalization ability. Through these improvements, PointACL performs well in various 3D understanding tasks, including object classification, part segmentation and few - shot learning, and shows stronger robustness in different noise environments. ### Formula summary 1. **Attention matrix calculation**: \[ A=\text{Softmax}\left(\frac{QK^{T}}{\sqrt{d}}\right) \] 2. **Importance score calculation**: \[ S_{j}=\frac{A_{1,j}\times\left\|V_{j}\right\|}{\sum_{i = 2}^{N+1}A_{1,i}\times\left\|V_{i}\right\|} \] 3. **Dynamic masking probability calculation**: \[ p_{dy}=\log\left(\text{Softmax}\left(\frac{S}{\tau_{pro}}\right)\right)-\log(-\log\epsilon) \] 4. **Contrastive learning loss**: \[ L_{\text{contra}}=-\frac{1}{2b}\sum_{i}\left(\log\frac{\exp(H_{m}^{i}\cdot H_{s}^{i}/\tau_{sim})}{\sum_{j}\exp(H_{m}^{i}\cdot H_{s}^{j}/\tau_{sim})}+\log\frac{\exp(H_{s}^{i}\cdot H_{m}^{i}/\tau_{sim})}{\sum_{j}\exp(H_{s}^{i}\cdot H_{m}^{j}/\tau_{sim})}\right) \] 5. **Total loss function**: \[ L_{\text{total}}=L_{\text{origin}}+\lambda L_{\text{contra}} \] Through these methods, PointACL effectively solves the limitations of existing Transformer models in point cloud understanding and improves the robustness and generalization ability of the models.

Point Cloud Understanding via Attention-Driven Contrastive Learning

Masked Autoencoders for Point Cloud Self-supervised Learning.

3D Point Cloud Classification Method Based on Multiple Attention Mechanism and Dynamic Graph Convolution

PointCAT: Cross-Attention Transformer for point cloud

3DPCT: 3D Point Cloud Transformer with Dual Self-attention

Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning

CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point Cloud Learning

PointACL:Adversarial Contrastive Learning for Robust Point Clouds Representation under Adversarial Attack

MPCT: Multiscale Point Cloud Transformer with a Residual Network

PointAttN: You Only Need Attention for Point Cloud Completion

EGCT: Enhanced Graph Convolutional Transformer for 3D Point Cloud Representation Learning

PointCAT: Contrastive Adversarial Training for Robust Point Cloud Recognition

Point Gated Attention

Point‐AGM : Attention Guided Masked Auto‐Encoder for Joint Self‐supervised Learning on Point Clouds

A point cloud self-learning network based on contrastive learning for classification and segmentation

APPT : Asymmetric Parallel Point Transformer for 3D Point Cloud Understanding

Unsupervised contrastive learning with simple transformation for 3D point cloud data

PointCLIP: Point Cloud Understanding by CLIP

PointMT: Efficient Point Cloud Analysis with Hybrid MLP-Transformer Architecture

Region-Transformer: Self-Attention Region Based Class-Agnostic Point Cloud Segmentation

PointCloud-At: Point Cloud Convolutional Neural Networks with Attention for 3D Data Processing