Algorithm 1 : Attribute Selection Algorithm Based on Mutual Dependency Input

YU CHANG-RUI,YAN LUO
2008-01-01
Abstract:The text categorization is a core technique in knowledge mining field. Most of categorization methods are based on VSM in the current research, of which the widely-used method is kNN. But most of them are highly complicated on computation, and could hardly be used for classifying large-scale sample. Moreover, to them, the classifier must be rebuilt when adding or deleting the training samples, which make them poor in scalability. In this paper, based on Mutual Dependence and Equivalent Radius, a new categorization method (called MDER) is proposed. MDER can be used to classify large-scale sample and has good scalability. After a series of experiments of classifying Chinese texts, the conclusion are drawn that MDER outperforms kNN and CCC method, and can be used online to classify large-scale sample while keeping higher precision and recall. Key-Words: Text categorization, Mutual dependence, Equivalent radius, VSM
What problem does this paper attempt to address?