WordNet-based Concept Vector Space Model for Text Classification

Zhang Jian,Li Chunping
DOI: https://doi.org/10.3321/j.issn:1002-8331.2006.04.054
2006-01-01
Abstract:In this paper,we design and implement an automatic text classification system,aiming at improving the accuracy of text classification.In current existing automatic text classification systems,the content of text is described by N-dimension feature vector model,but the approaches for establishing the model have great influence on the accuracy of text classification.Vector Space Model(VSM),as one of the most effective approaches,describes a document as orthogonal term vectors.The assumption of the VSM approach is that the semantic relation between terms is ignored.But in the real world,semantic relations between terms usually exist,such as synonymy and hypernymy-hyponymy,etc.Here we introduce a novel approach,based on WordNet,for describing a text by establishing concept vector space model.In our approach,we can extract the high-level information on categories during training process by replacing terms with synonymy sets in WordNet and considering hypernymy-hyponymy relation between synonymy sets.We carry on a series of experiments to compare our approach with the term-based VSM approach.The results show that our approach could improve the accuracy of text classification especially when the size of trainning set is small.
What problem does this paper attempt to address?