A METHOD OF HIERARCHICAL DOCUMENT AUTOMATIC CLASSIFICATION IN E-RESEARCH

Yun Jian,Jiang Di,Pan Wuyun
DOI: https://doi.org/10.3969/j.issn.1000-386X.2009.11.015
2009-01-01
Abstract:Subjects crossing is very common in E-research,so it is necessary for documents in multi-subjects to be classified automatically.In accordance with high dimensions of documents in these multi-subjects,a method of hierarchical automatic classification with the thought of DC(divide and conquer) is proposed and used for E-research of comparative linguistics in E-Institutes of Shanghai Universities.First,vectors of different documents are clustered by geometric classification without matrix transpose.Then,1-dimensional feature space is formed through Fisher linear discriminant criterion.Finally,by using NBayes decision,a MCE(minimum classification error) decision,documents automatic classification is proceeded in an effective way.Experiment result indicates that the method is effective:both in close-set test and open-set test,it has good performance in precision,recall and F1.The classification process takes 0.29 sec on average.The above work offers some intelligent supports for E-research.
What problem does this paper attempt to address?