Self Adaptive Threshold Pseudo-labeling and Unreliable Sample Contrastive Loss for Semi-supervised Image Classification

Xuerong Zhang,Li Huang,Jing Lv,Ming Yang
2024-07-04
Abstract:Semi-supervised learning is attracting blooming attention, due to its success in combining unlabeled data. However, pseudo-labeling-based semi-supervised approaches suffer from two problems in image classification: (1) Existing methods might fail to adopt suitable thresholds since they either use a pre-defined/fixed threshold or an ad-hoc threshold adjusting scheme, resulting in inferior performance and slow convergence. (2) Discarding unlabeled data with confidence below the thresholds results in the loss of discriminating information. To solve these issues, we develop an effective method to make sufficient use of unlabeled data. Specifically, we design a self adaptive threshold pseudo-labeling strategy, which thresholds for each class can be dynamically adjusted to increase the number of reliable samples. Meanwhile, in order to effectively utilise unlabeled data with confidence below the thresholds, we propose an unreliable sample contrastive loss to mine the discriminative information in low-confidence samples by learning the similarities and differences between sample features. We evaluate our method on several classification benchmarks under partially labeled settings and demonstrate its superiority over the other approaches.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve two main problems in semi - supervised image classification: 1. **The problem of threshold selection in the pseudo - label method**: Existing pseudo - label methods may not be able to adopt appropriate thresholds because they either use predefined/fixed thresholds or an ad - hoc threshold adjustment scheme, which leads to poor performance and slow convergence. 2. **The problem of information loss in low - confidence samples**: Discarding unlabeled data with confidence below the threshold will lead to the loss of discriminative information. To solve these problems, the authors propose an effective method to make full use of unlabeled data. Specifically, they design the **Self - Adaptive Threshold Pseudo - labeling (SATPL)**, which can dynamically adjust the threshold for each category to increase the number of reliable samples. At the same time, in order to effectively use unlabeled data with confidence below the threshold, they propose the **Unreliable Sample Contrastive Loss (USCL)**, which mines discriminative information in low - confidence samples by learning the similarities and differences between sample features. ### Main contributions 1. **Self - Adaptive Threshold Pseudo - labeling (SATPL)**: Considering that the model performance gradually improves during the training process, the threshold for each category is adjusted in an adaptive manner, gradually increasing the number of reliable samples. 2. **Unreliable Sample Contrastive Loss (USCL)**: By learning the similarities and differences between sample features, it effectively mines discriminative information in low - confidence samples, enabling the model to use all unlabeled data and accelerate the convergence speed of the model. 3. **Experimental results**: The experimental results on three datasets, CIFAR - 10, CIFAR - 100 and STL - 10, show that the proposed method outperforms other methods in semi - supervised image classification tasks. ### Method overview #### Self - Adaptive Threshold Pseudo - label (SATPL) - **Global threshold**: Dynamically adjust the global threshold according to different datasets. - **Local threshold**: Based on the global threshold, dynamically adjust the local threshold according to the learning effect of each category. - **Threshold adjustment**: In the early stage of training, the model's learning ability is weak, and a lower global threshold is required to use more unlabeled data; as the training progresses, the model's learning ability is enhanced, and a higher global threshold is required to filter out wrong pseudo - labels, reduce confirmation bias, and improve the quality of pseudo - labels. #### Unreliable Sample Contrastive Loss (USCL) - **Construction of positive and negative sample pairs**: Construct positive and negative sample pairs, and mine discriminative information in low - confidence samples through contrastive learning. - **Contrastive loss function**: Use the contrastive loss function to improve the representation ability of the model by learning the similarities and differences between sample features. ### Experimental results - **CIFAR - 10**: In the case of 40 labels, the average accuracy rate of the STUC - SSIC method reaches 91.89%, which is 6% higher than that of the FixMatch method. - **CIFAR - 100**: In the case of 40 labels, the average accuracy rate of the STUC - SSIC method reaches 46.88%, which is 1% higher than that of the FullMatch method and 6% higher than that of the FixMatch method. - **STL - 10**: In the case of 40 labels, the STUC - SSIC method also achieves good classification results on complex datasets. ### Ablation study - **Influence of different components**: Through the ablation study, it is found that both the SATPL and USCL components have a significant impact on the performance of the model, and when these two modules are used jointly, the classification performance of the model is the best. - **Influence of self - adaptive threshold pseudo - label**: The self - adaptive threshold pseudo - label strategy can generate more high - quality pseudo - labels, thereby improving the accuracy of the model. In general, the method proposed in this paper performs well in semi - supervised image classification tasks, especially having obvious advantages in using unlabeled data.