Text Classification Based on Transfer Learning and Self-Training.

Yabin Zheng,Shaohua Teng,Zhiyuan Liu,Maosong Sun
DOI: https://doi.org/10.1109/icnc.2008.498
2008-01-01
Abstract:Traditional text classification methods make a basic assumption: the training and test set are homologous' while this naive assumption may not hold in the real world, especially in the web environment. Documents on the web change from time to time, pre-trained model may be out of date when applied to new emerging documents. However some information of training set is nonetheless useful. In this paper we proposed a novel method to discover the constant common knowledge in both training and test set by transfer learning, then a model is built based on this knowledge to fit the distribution in test set. The model is reinforced iteratively by adding most confident instances in unlabeled test set to training set until convergence, which is a self-training process, preliminary experiment shows that our method achieves an approximately 8.92% improvement as compared to the standard supervised-learning method.
What problem does this paper attempt to address?