A Ontology-based Document Feature Extraction

LIN Dong-Wen,BAI Qing-Yuan,XIE Li-Cong,XIE Huo-Sheng,ZHANG Ying
DOI: https://doi.org/10.3969/j.issn.1002-137X.2008.03.046
2008-01-01
Computer Science
Abstract:To effectively reduce the dimension of document vectors,we introduce a novel method employing domain ontology to extract feature concept. For all document categories,all raw words in each category are mapped to concepts in their relative concept tree derived from the domain ontology. At the same time the frequency of raw words is transformed into the frequency of concepts. Experimental results show that this method can effectively reduce the dimension of document vectors without loss of categorization accuracy,compared with traditional document vectors.
What problem does this paper attempt to address?