Point Cloud Understanding via Attention-Driven Contrastive Learning

Yi Wang,Jiaze Wang,Ziyu Guo,Renrui Zhang,Donghao Zhou,Guangyong Chen,Anfeng Liu,Pheng-Ann Heng
2024-11-22
Abstract:Recently Transformer-based models have advanced point cloud understanding by leveraging self-attention mechanisms, however, these methods often overlook latent information in less prominent regions, leading to increased sensitivity to perturbations and limited global comprehension. To solve this issue, we introduce PointACL, an attention-driven contrastive learning framework designed to address these limitations. Our method employs an attention-driven dynamic masking strategy that guides the model to focus on under-attended regions, enhancing the understanding of global structures within the point cloud. Then we combine the original pre-training loss with a contrastive learning loss, improving feature discrimination and generalization. Extensive experiments validate the effectiveness of PointACL, as it achieves state-of-the-art performance across a variety of 3D understanding tasks, including object classification, part segmentation, and few-shot learning. Specifically, when integrated with different Transformer backbones like Point-MAE and PointGPT, PointACL demonstrates improved performance on datasets such as ScanObjectNN, ModelNet40, and ShapeNetPart. This highlights its superior capability in capturing both global and local features, as well as its enhanced robustness against perturbations and incomplete data.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve two main problems existing in the current Transformer - based point cloud understanding models: 1. **Increased sensitivity to perturbations**: Existing Transformer models usually rely on a small number of high - attention regions to analyze point clouds, which makes them more sensitive to noise and incomplete data. Specifically, when these high - attention regions are interfered with, the model performance will decline significantly. 2. **Limited global understanding**: Due to ignoring the potential information in low - attention regions, these models have limitations in understanding and capturing the global structure of point clouds. Point cloud data itself is sparse and has no redundant information, so ignoring certain regions may lead to the omission of important information. To solve these problems, the author introduced a new framework named **PointACL** (Attention - driven Contrastive Learning for Point Clouds). PointACL improves the existing models in the following ways: - **Attention - driven dynamic masking strategy**: By dynamically adjusting the masking probability, the model is guided to focus on those neglected low - attention regions, thereby enhancing the understanding of the global structure. - **Combination of contrastive learning loss and pre - training loss**: Combining the traditional pre - training loss (such as reconstruction loss or generation loss) with the contrastive learning loss improves the feature discrimination ability and generalization ability. Through these improvements, PointACL performs well in various 3D understanding tasks, including object classification, part segmentation and few - shot learning, and shows stronger robustness in different noise environments. ### Formula summary 1. **Attention matrix calculation**: \[ A=\text{Softmax}\left(\frac{QK^{T}}{\sqrt{d}}\right) \] 2. **Importance score calculation**: \[ S_{j}=\frac{A_{1,j}\times\left\|V_{j}\right\|}{\sum_{i = 2}^{N+1}A_{1,i}\times\left\|V_{i}\right\|} \] 3. **Dynamic masking probability calculation**: \[ p_{dy}=\log\left(\text{Softmax}\left(\frac{S}{\tau_{pro}}\right)\right)-\log(-\log\epsilon) \] 4. **Contrastive learning loss**: \[ L_{\text{contra}}=-\frac{1}{2b}\sum_{i}\left(\log\frac{\exp(H_{m}^{i}\cdot H_{s}^{i}/\tau_{sim})}{\sum_{j}\exp(H_{m}^{i}\cdot H_{s}^{j}/\tau_{sim})}+\log\frac{\exp(H_{s}^{i}\cdot H_{m}^{i}/\tau_{sim})}{\sum_{j}\exp(H_{s}^{i}\cdot H_{m}^{j}/\tau_{sim})}\right) \] 5. **Total loss function**: \[ L_{\text{total}}=L_{\text{origin}}+\lambda L_{\text{contra}} \] Through these methods, PointACL effectively solves the limitations of existing Transformer models in point cloud understanding and improves the robustness and generalization ability of the models.