LayerMatch: Do Pseudo-labels Benefit All Layers?

Chaoqi Liang,Guanglei Yang,Lifeng Qiao,Zitong Huang,Hongliang Yan,Yunchao Wei,Wangmeng Zuo
2024-06-27
Abstract:Deep neural networks have achieved remarkable performance across various tasks when supplied with large-scale labeled data. However, the collection of labeled data can be time-consuming and labor-intensive. Semi-supervised learning (SSL), particularly through pseudo-labeling algorithms that iteratively assign pseudo-labels for self-training, offers a promising solution to mitigate the dependency of labeled data. Previous research generally applies a uniform pseudo-labeling strategy across all model layers, assuming that pseudo-labels exert uniform influence throughout. Contrasting this, our theoretical analysis and empirical experiment demonstrate feature extraction layer and linear classification layer have distinct learning behaviors in response to pseudo-labels. Based on these insights, we develop two layer-specific pseudo-label strategies, termed Grad-ReLU and Avg-Clustering. Grad-ReLU mitigates the impact of noisy pseudo-labels by removing the gradient detrimental effects of pseudo-labels in the linear classification layer. Avg-Clustering accelerates the convergence of feature extraction layer towards stable clustering centers by integrating consistent outputs. Our approach, LayerMatch, which integrates these two strategies, can avoid the severe interference of noisy pseudo-labels in the linear classification layer while accelerating the clustering capability of the feature extraction layer. Through extensive experimentation, our approach consistently demonstrates exceptional performance on standard semi-supervised learning benchmarks, achieving a significant improvement of 10.38% over baseline method and a 2.44% increase compared to state-of-the-art methods.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the inconsistent influence of pseudo - labels on different layers in semi - supervised learning (SSL). Specifically, existing research generally assumes that the influence of pseudo - labels in all layers is uniform, but this assumption does not always hold. Through theoretical analysis and experiments, the author proves that there are significant differences in the learning behaviors of the feature extraction layer and the linear classification layer with respect to pseudo - labels: 1. **Feature extraction layer**: By optimizing the consistency regularization loss, the feature extraction layer can enhance the data clustering ability, making well - optimized data form high - density regions, while poorly - optimized data are scattered in low - density regions, which contain many incorrect pseudo - labels. 2. **Linear classification layer**: Since the consistency regularization loss in the low - density regions is difficult to optimize, the linear classification layer must compensate for this part of the loss, resulting in incorrect pseudo - labels having a negative impact on the linear classification layer and thus reducing the model performance. Based on the above findings, the author proposes the **LayerMatch** method, which adopts two layer - specific pseudo - label strategies: 1. **Grad - ReLU**: Eliminate the gradient influence of pseudo - labels in the classification layer to avoid the interference of pseudo - label noise on the classification layer, while retaining the pseudo - label gradients in the feature extraction layer so that it can learn clustering features from pseudo - labels. 2. **Avg - Clustering**: Use the exponential moving average (EMA) strategy to stabilize the clustering centers of the feature extraction layer, accelerate feature convergence, and reduce the influence of pseudo - label errors in the low - density regions. Through extensive experimental verification, LayerMatch performs excellently on the standard semi - supervised learning benchmarks, improving the performance by 10.38% and 2.44% respectively compared to the baseline method and the existing state - of - the - art method. ### Main contributions 1. **Identify differences**: Reveal the differences in the learning behaviors of pseudo - labels in the feature extraction layer and the linear classification layer, and point out that pseudo - labels are beneficial to the feature extraction layer but harmful to the linear classification layer. 2. **Propose a method**: Propose LayerMatch, a new layer - specific pseudo - label application strategy, which can reduce the negative impact of pseudo - labels on the linear classification layer while maintaining the benefit of the feature extraction layer from pseudo - labels. 3. **Experimental verification**: Verify the effectiveness of LayerMatch through a large number of experiments and prove that it significantly improves the model performance on multiple datasets.