A Semantics-based Information Retrieval Model

Xiaohua Zhou
2005-01-01
Abstract:In this paper, I propose a semantics-based IR model, which uses triplet (concept-relation- concept) instead of keywords to index documents. It uses the sense instead of the string of a term to index document, which not only makes the representation more accuracy, but also well solves the synonym problem. More importantly, this model avoids the frequently occurred situation in keyword- based IR model that two keywords co-occur in a document but they do not have any syntactic or semantic relation at all. Besides, the model well supports the integration with domain ontology. Thus, it is reasonable to expect higher performance on the proposed semantics-based IR model. I present the models and the methods for components of indexing, searching and matching in detail, which documents the technical feasibility. A case study is performed showing the performance improvement of the new model in comparison with keyword-based IR model.
What problem does this paper attempt to address?