Online Text Classification

JH Guan,SG Zhou
IF: 1.019
2006-01-01
Chinese Journal of Electronics
Abstract:Text classification is becoming more and more important with the rapid growth of on-line information available. kNN is a widely used text classification method of high performance. However, this method is inefficient because it requires a large amount of computation for evaluating the similarity between a test document and each training document and for sorting the similarities. In this paper, an online kNN text classification approach based on pruning the training corpus is proposed. By using this approach, the size of training corpus can be condensed sharply so that time-consuming on kNN searching can be cut off significantly, and consequently classification efficiency can be improved substantially while classification performance is preserved comparable to that of without pruning. Effective algorithm for text corpus pruning is designed. Experiments over the Reuters corpus are carried out, which validates the effectiveness and efficiency of the proposed approach.
What problem does this paper attempt to address?