A New Method of Training Sample Selection in Text Classification

Yixing Liao,Xuezeng Pan
DOI: https://doi.org/10.1109/etcs.2010.621
2010-01-01
Abstract:Aiming to noise samples in the training dataset, a new method for reducing the amount of training dataset is proposed in the paper which is applicable to text classification. This method describes the distribution of training dataset according to the representativeness score of samples in the class they belong to, so as to show representative samples and noise samples in each class. The new method is applied on Chinese text dataset provided by Fudan Database Center. The experiments show that the proposed method can reduce noise samples effectively, improve the performance of classification and decrease the computational cost.
What problem does this paper attempt to address?