Semi-supervised Regression with Data Partitioning and Feature Mapping

Fan Min,Jia-hui Zhang,Liyan Liu
DOI: https://doi.org/10.1109/DSAA54385.2022.10032446
2022-10-13
Abstract:Semi-supervised regression attempts to utilize as much unlabeled data as possible and as little labeled data as possible to improve model performance. Methods based on data partitioning can improve regression performance from the perspective of data distribution. However, most partitioning methods only consider the correlation between the data, but not the relationship between the regressor and the data. In this study, a strategy of dividing the data and then regressing is proposed, where data partitioning is based on the relationship between the regression and data. Accordingly, an intuitive and effective algorithm named SRPF, i.e. Semi-supervised regression based on data partitioning and feature mapping, is proposed. First, we divide the labeled dataset into two disjoint subsets based on the relationship between the predicted and actual values of separation regressor. Second, we label these two subsets as distinct classes and use feature mapping to map data features to higher dimensions to better distinguish the data. Third, we construct a partitioner to determine which subset an unlabeled instance belongs to, and then use the regressor on the corresponding data set to make predictions. Finally, during the iterative training process, a self-training method is used to enrich labeled samples. Experiments are conducted on 15 well-known datasets compared to state-of-the-art algorithms. The results show that our method outperforms them in most datasets.
Computer Science
What problem does this paper attempt to address?