Active Learning using Localized Generalization Error for Text Categorization

Yeung, D.S.,Zhang, Y.,Ng, W.W.Y.,Qing-Cai Chen
DOI: https://doi.org/10.1109/ICMLC.2006.258926
2006-01-01
Abstract:Text categorization is one of the important steps of many applications, e.g. Web page classification, indexing in search engine and information retrieval. When the number of documents available is huge, active learning could help relief the training time and cost. Moreover, active learning is able to filter out noisy samples for training and therefore may achieve better generalization capability. In this work, we adopt the localized generalization error model to active learning for text categorization. In our approach, the samples yielding the highest generalization error for those unseen samples local to it is selected as the next training sample. The feature extraction from raw documents is also discussed. Experimental results show that the proposed method is effective in reducing the number of training samples and achieves good generalization capability
What problem does this paper attempt to address?