Using Term Sense to Improve Language Modeling Approach to Genomic IR

Xiaohua Zhou,Xiaodan Zhang,Xiaohua Hu
2005-01-01
Abstract:Genomic IR, characterized by its highly specific information need, severe synonym and polysemy problem, long term name and rapid growing lit- erature size, is challenging IR community. In this paper, we are focused on ad- dressing the synonym and polysemy issue under the language modeling frame- work. Unlike the ways translation model and traditional query expansion tech- niques approach to this issue, we incorporate term sense into the basic language model, a more fundamental approach to the synonym and polysemy issue in IR. The sense approach not only maintains the simplicity of language models, but also makes the document ranking efficient and effective. A comparative ex- periment on the TREC 2004 Genomic Track data shows significant improve- ment of retrieval performance after incorporating the term sense into a basic language model. The MAP (mean average precision) is significantly raised from 29.17% (the baseline system) to 36.94%. The performance of the sense approach is also significantly superior to the mean (21.72%) of official runs participated in TREC 2004 Genomic Track and is comparable to the best work (40.75%) of the track. Most runs in the track extensively use various query ex- pansion and pseudo relevance feedback techniques while our approach does nothing except the incorporation of term sense, which evidences the view that semantic smoothing, i.e. the incorporation of synonym and sense information into the language models, is a more standard approach to achieving the effects traditional query expansion and pseudo-relevance feedback techniques are working for.
What problem does this paper attempt to address?