Improving Sparsely Labeled Text Classification with Data Editing

Xue Zhang,Dong-yan Zhao,Wang-xin Xiao
DOI: https://doi.org/10.1109/icise.2010.5690328
2010-01-01
Abstract:In this paper, an active semi-supervised framework combining with data editing is proposed to improve sparsely labeled text classification. It integrates semi-supervised learning with active learning, and fully utilizes the advantage of active learning by fusing it with a data editing technique. The algorithm works in an iterative fashion in which the steps of self-labeling, active labeling and editing are iterated alternatively. Active learning and data editing techniques are designed to cope with the training data bias and sparsity. According to our knowledge, the fusion of active learning with data editing technique to eliminate self-labeled noise is novel. Extensive experimental study on several real-world data sets shows the encouraging results of the proposed text classification framework for sparsely labeled text classification compared with several state-of-the-art methods.
What problem does this paper attempt to address?