A document feature extraction method based on concept-word list

Zhengyu Zhu,Jie He,ShuJia Dong,ChunLei Yu
DOI: https://doi.org/10.4028/www.scientific.net/AMR.267.386
2011-01-01
Abstract:When describing a document in Vector Space Model (VSM), it often assumes that there is no semantic relationship between words or they are orthogonal to each other. In order to improve the inaccurate document description, a new document description method has been proposed in this paper by introducing a concept-word, which calculates the semantic similarity between words based on HowNet ontology database. Comparative experiments show that the new method can not only improve effectively the effect of document feature description in VSM, but also reduce significantly the dimension of a document vector. The research is very useful to document clustering, query word expansion in Web information retrieval and personalized service in e-business applications.
What problem does this paper attempt to address?