Incremental Self-training for Semi-supervised Learning

Jifeng Guo,Zhulin Liu,Tong Zhang,C. L. Philip Chen

2024-04-14

Abstract:Semi-supervised learning provides a solution to reduce the dependency of machine learning on labeled data. As one of the efficient semi-supervised techniques, self-training (ST) has received increasing attention. Several advancements have emerged to address challenges associated with noisy pseudo-labels. Previous works on self-training acknowledge the importance of unlabeled data but have not delved into their efficient utilization, nor have they paid attention to the problem of high time consumption caused by iterative learning. This paper proposes Incremental Self-training (IST) for semi-supervised learning to fill these gaps. Unlike ST, which processes all data indiscriminately, IST processes data in batches and priority assigns pseudo-labels to unlabeled samples with high certainty. Then, it processes the data around the decision boundary after the model is stabilized, enhancing classifier performance. Our IST is simple yet effective and fits existing self-training-based semi-supervised learning methods. We verify the proposed IST on five datasets and two types of backbone, effectively improving the recognition accuracy and learning speed. Significantly, it outperforms state-of-the-art competitors on three challenging image classification tasks.

Machine Learning

What problem does this paper attempt to address?

The paper aims to address several key issues in self-training methods for semi-supervised learning: 1. **Pseudo-label noise problem**: Traditional self-training methods may generate incorrect pseudo-labels during the iterative process, leading to a decline in model performance. 2. **Insufficient utilization of unlabeled data**: Existing works, although emphasizing the importance of unlabeled data, fail to effectively utilize these data. 3. **High time consumption**: Multiple queries and clustering operations during the iterative learning process result in prolonged training time. To tackle these issues, the paper proposes the Incremental Self-training (IST) method. IST improves traditional self-training methods in the following ways: - **Batch processing of unlabeled data**: IST first clusters all unlabeled samples and prioritizes assigning pseudo-labels to easily classifiable samples based on the clustering results, thereby enhancing the early performance of the base classifier. - **Introduction of a sequential query list**: By forming a query list based on sample certainty, IST reduces multiple clustering and query operations, thus accelerating the iterative learning process. - **Utilization of samples near the decision boundary**: After the model stabilizes, IST focuses on handling samples near the decision boundary, further improving classifier performance. Experimental results show that IST significantly improves recognition accuracy and reduces training time on multiple benchmark datasets, outperforming existing state-of-the-art methods.

Incremental Self-training for Semi-supervised Learning

An Incremental-Self-Training-Guided Semi-Supervised Broad Learning System

ST++: Make Self-trainingWork Better for Semi-supervised Semantic Segmentation

The GIST and RIST of Iterative Self-Training for Semi-Supervised Segmentation

Debiased Self-Training for Semi-Supervised Learning

Rethinking Self-training for Semi-supervised Landmark Detection: A Selection-free Approach

Improving Semi-Supervised Self-Training with Embedded Manifold Transduction

Learning to Self-Train for Semi-Supervised Few-Shot Classification.

Semantic Alignment with Self-Supervision for Class Incremental Learning

Better Self-training for Image Classification Through Self-supervision

Self-Training for Class-Incremental Semantic Segmentation

Semi-Supervised Learning for Fine-Grained Classification with Self-Training.

Semi-Supervised Semantic Segmentation Via Dynamic Self-Training and Class-Balanced Curriculum.

Self-Training: A Survey

Graph-Based Self-Training for Semi-Supervised Deep Similarity Learning

Semi-Supervised Landcover Classification with Adaptive Pixel-Rebalancing Self-Training.

Iterative Self-Learning: Semi-Supervised Improvement to Dataset Volumes and Model Accuracy

Doubly Robust Self-Training

Self-paced and self-consistent co-training for semi-supervised image segmentation