A comparative study for wordnet guided text representation

Jian Zhang,Chunping Li
DOI: https://doi.org/10.1007/11589990_102
2005-01-01
Abstract:Text information processing depends critically on the proper text representation. A common and naïve way of representing a document is a bag of its component words [1], but the semantic relations between words are ignored, such as synonymy and hypernymy-hyponymy between nouns. This paper presents a model for representing a document in terms of the synonymy sets (synsets) in WordNet [2]. The synsets stand for concepts corresponding to the words of the document. The Vector Space Model describes a document as orthogonal term vectors. We replace terms with concepts to build Concept Vector Space Model (CVSM) for the training set. Our experiments on the Reuters Corpus Volume I (RCV1) dataset have shown that the result is satisfactory.
What problem does this paper attempt to address?