An Efficient Algorithm for Large-Scale Text Categorization

CHANG-RUI YU,YAN LUO,P. R. China
2008-01-01
Abstract:The text categorization is a core technique in knowle dge mining field. Most of categorization methods are based on VSM in the current research, of which the widely-used method is kNN. But most of them are highly complicated on computation, and could hardly be used for classifying large-scale sample. Moreover, to them, the classifier must be rebuilt when adding or deleting the training samples, which make them poor in scalability. In this paper, based on Mutual Dependence and Equivalent Radius, a new categorization method (called MDER) is proposed. MDER can be used to classify large-scale sample and has good scalability. After a series of experiments of classifying Chinese texts, the conclusion are drawn that MDER outperforms kNN and CCC method, and can be used online to classify large-scale sample while keeping higher precision and recall.
What problem does this paper attempt to address?