PP-SSL : Priority-Perception Self-Supervised Learning for Fine-Grained Recognition

ShuaiHeng Li,Qing Cai,Fan Zhang,Menghuan Zhang,Yangyang Shu,Zhi Liu,Huafeng Li,Lingqiao Liu
2024-11-28
Abstract:Self-supervised learning is emerging in fine-grained visual recognition with promising results. However, existing self-supervised learning methods are often susceptible to irrelevant patterns in self-supervised tasks and lack the capability to represent the subtle differences inherent in fine-grained visual recognition (FGVR), resulting in generally poorer performance. To address this, we propose a novel Priority-Perception Self-Supervised Learning framework, denoted as PP-SSL, which can effectively filter out irrelevant feature interference and extract more subtle discriminative features throughout the training process. Specifically, it composes of two main parts: the Anti-Interference Strategy (AIS) and the Image-Aided Distinction Module (IADM). In AIS, a fine-grained textual description corpus is established, and a knowledge distillation strategy is devised to guide the model in eliminating irrelevant features while enhancing the learning of more discriminative and high-quality features. IADM reveals that extracting GradCAM from the original image effectively reveals subtle differences between fine-grained categories. Compared to features extracted from intermediate or output layers, the original image retains more detail, allowing for a deeper exploration of the subtle distinctions among fine-grained classes. Extensive experimental results indicate that the PP-SSL significantly outperforms existing methods across various datasets, highlighting its effectiveness in fine-grained recognition tasks. Our code will be made publicly available upon publication.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve two main problems existing in the current self - supervised learning (SSL) methods in the fine - grained visual recognition (FGVR) tasks: 1. **Interference from irrelevant features**: - In self - supervised learning tasks, the model is easily affected by patterns irrelevant to the task (such as background noise). These irrelevant features can lead to feature entanglement and affect the discrimination between fine - grained categories. - When dealing with FGVR tasks, the existing SSL methods often fail to effectively filter out these irrelevant features, resulting in a decline in performance. 2. **Insufficient representation of fine - grained features**: - FGVR tasks require the model to be able to capture subtle visual differences, such as the subtle differences between different bird species, aircraft models or vehicle types. - Existing methods have difficulty accurately representing these subtle features, especially when dealing with cases where the inter - class differences are small but the intra - class differences are large. To solve these problems, the authors propose a new priority - perception self - supervised learning framework (PP - SSL). This framework improves the effect of fine - grained visual recognition through the following two key components: - **Anti - Interference Strategy (AIS)**: - Utilize the fine - grained text corpus and knowledge distillation strategy to guide the model to eliminate the interference of irrelevant features and enhance the learning of high - quality features. - **Image - Aided Distinction Module (IADM)**: - Extract GradCAM from the original image, focus on subtle category differences, reduce the impact of inter - class differences and improve intra - class consistency. Through these improvements, PP - SSL can significantly improve the performance of fine - grained visual recognition tasks on multiple benchmark datasets, especially in retrieval and classification tasks. ### Formula presentation The formulas involved in the paper are as follows: 1. **Contrastive learning loss function**: \[ L_{CL}(q, k)=-\log\frac{\exp(q\cdot k / \tau)}{\sum_{i = 1}^{K}\exp((q\cdot k_i)/\tau)} \] where \( q \) and \( k \) are positive sample pairs, \( k_i \) is a negative sample, and \( \tau \) is a temperature parameter. 2. **Knowledge distillation loss function of AIS**: \[ L_{AIS}(l_t, l_s,\tau)=\tau^2\cdot KL(\sigma(l_t / \tau),\sigma(l_s / \tau)) \] where \( l_t \) and \( l_s \) are the predicted logits of the teacher model and the student model respectively, \( \sigma \) is the softmax function, and \( KL \) is the Kullback - Leibler divergence. 3. **Optimization objective of IADM**: \[ L_{IADM}(\text{Grad - Img}\|w)=\text{Grad - Img}\cdot\log\left(\frac{\text{Grad - Img}}{w}\right) \] 4. **Total loss function**: \[ L_{total}=L_{CL}+\alpha L_{AIS}+\beta L_{IADM} \] where \( \alpha = 1.2 \) and \( \beta = 0.01 \) are hyperparameters that control the weights of each loss term. These formulas ensure that the model can during the training process.