Efficient information retrieval based on a combination of vector space and probabilistic models

Min Zhang,Shaoping Ma
DOI: https://doi.org/10.1109/icsmc.2002.1173456
2002-01-01
Abstract:This paper is to study the possibility and effectiveness to combine vector space model and probabilistic model so as to achieve the improved IR performance. Analyses are firstly made to observe the necessity and possibility of combining the two models. Two systems were used in the study: Okapi for probabilistic model and SMART for VSM. Evaluation on three IR standard test collections, namely CACM, ADI and MED, show that: (1) VSM based approach we observed always has 13% to 24% smaller on average rank than probabilistic based approach. (2) Probabilistic approach is consistently 6.6% to 44.2% lower on unfound proportion. (3) Results of 11-point average precision and top-5 precision do not have obvious difference between two systems. (4) For each test set, there are 8% to 20% relevant documents could be retrieved by only one system. These studies confirmed the necessity of our idea to combine the two models. The performance of the combination criteria were also evaluated on the three collections. Experimental results show that the combined criterion consistently leads to at most 6.9% improvement on 11-average precision and 9.2% improvement on top-5 precision compared with the better performed model.
What problem does this paper attempt to address?