Leveraging World Knowledge in Chinese Text Classification

Shu Xu,Maosong Sun
DOI: https://doi.org/10.1109/alpit.2007.105
2007-01-01
Abstract:In state-of-the-art Text Classification (TC) approaches, only features explicitly mentioned in training set are taken into consideration, but after several decades' endeavor, it seems that these approaches have all reached a plateau. In this paper, we propose an automatic taxonomy mapping algorithm to map from original flat taxonomy to a hierarchical, human-edit on-line taxonomy (ODP), from which we could then synthesize new training samples with common-sense world knowledge by performing a constrained web focus crawling. We show that by leveraging the domain-knowledge which otherwise can't be deduced from training set directly, the text classifier will have better generalization ability. Preliminary Experimental Results on several Chinese data sets confirm the effectiveness of this approach.
What problem does this paper attempt to address?