Semi-supervised regression via embedding space mapping and pseudo-label smearing

Liyan Liu,Jin Zhang,Kun Qian,Fan Min
DOI: https://doi.org/10.1007/s10489-024-05686-6
IF: 5.3
2024-07-20
Applied Intelligence
Abstract:Co-training is a semi-supervised algorithm that aims to improve prediction effects by exchanging confident instances and pseudo-labels among multiple learners. One central issue is how to mitigate the negative impact of low-quality pseudo-labels during training. In this paper, we propose semi-supervised regression via embedding space mapping and pseudo-label smearing (S2RMS) to ensure that unlabeled data contribute positively to the prediction process. First, a Triplet neural network is trained using pairwise data generated from labeled data. This network maps the training data to the embedding space to better separate dissimilar instances. Second, the embedded data are randomly partitioned into different subsets to train corresponding regression models (a.k.a. regressors). These regressors are integrated into the prediction process. Third, these subsets are augmented using unlabeled data with high similarity to the labeled data and high-confidence pseudo-labels. Here, the similarity and confidence are calculated using the network and the smearing technique, respectively. Experiments are conducted on fourteen datasets, and the results are compared to those of three excellent algorithms. The results show that S2RMS outperforms other co-training and metric semi-supervised regression algorithms. The source code is available at: https://github.com/BetaCatPro/S2RMS.
computer science, artificial intelligence
What problem does this paper attempt to address?