Chniese Document Classification Using Field Association Knowledge Base

Li Wang,Kui Jiang,Xingyun Geng,Yuanpeng Zhang,Dong Zhou,Jiancheng Dong
DOI: https://doi.org/10.1109/ccis.2012.6664616
2012-01-01
Abstract:Field Association (FA) terms are a limited set of discriminating terms that offer human knowledge to identify document (text) fields. Field association knowledge base (FAKB) is composed of FA terms and their potential hierarchical relationship of the fields belongs to. The primary goal of this research is to build a system that can imitate the process whereby humans recognize the fields by looking at a few Chinese FA terms in a document (text). The documents classification experiment is made on two data collections under different circumstances, including 4000 and 1300 documents respectively. FAKB outperforms all the other statistical methods (SVMs, kNN, and NB) with the average accuracies of 97.7% and 89%. All the experimental results clearly prove that the presented novel method is effective in Chinese document classification.
What problem does this paper attempt to address?