Optimizing top-k retrieval: submodularity analysis and search strategies

Chaofeng Sha,Keqiang Wang,Dell Zhang,Xiaoling Wang,Aoying Zhou
DOI: https://doi.org/10.1007/s11704-015-5222-7
IF: 2.6688
2016-01-19
Frontiers of Computer Science
Abstract:The key issue in top-k retrieval, finding a set of k documents (from a large document collection) that can best answer a user’s query, is to strike the optimal balance between relevance and diversity. In this paper, we study the top-k retrieval problem in the framework of facility location analysis and prove the submodularity of that objective function which provides a theoretical approximation guarantee of factor 1−$$\frac{1}{e}$$ for the (best-first) greedy search algorithm. Furthermore, we propose a two-stage hybrid search strategy which first obtains a high-quality initial set of top-k documents via greedy search, and then refines that result set iteratively via local search. Experiments on two large TREC benchmark datasets show that our two-stage hybrid search strategy approach can supersede the existing ones effectively and efficiently.
computer science, information systems, theory & methods, software engineering
What problem does this paper attempt to address?