Heterogeneous-training: A Semi-supervised Text Classification Method

Yuhao Shen,Bo Li,Xinlan Xu,Bing Luo,Chao Zhang,Fei Hao
DOI: https://doi.org/10.1145/3617695.3617707
2023-01-01
Abstract:With the advent of the information age, there are more and more text data on the Internet. As the most widely distributed information carrier with the largest amount of data, it is particularly important to use text classification technology to organize and manage massive data scientifically. In this paper, a semi-supervised ensemble learning algorithm Heterogeneous-training is proposed and applied to the field of text classification. Based on the Tri-training algorithm, the Heterogeneous-training algorithm improves the traditional Tri-training algorithm by using different classifiers, dynamically updating the probability threshold and adaptively editing data. A large number of experiments show that our method always outperforms Tri-training algorithm in text classification on benchmark text data sets.
What problem does this paper attempt to address?