From Greedy Selection to Exploratory Decision-Making

Yue Feng,Jun Xu,Yanyan Lan,Jiafeng Guo,Wei Zeng,Xueqi Cheng
DOI: https://doi.org/10.1145/3209978.3209979
2018-01-01
Abstract:The goal of search result diversification is to select a subset of documents from the candidate set to satisfy as many different subtopics as possible. In general, it is a problem of subset selection and selecting an optimal subset of documents is NP-hard. Existing methods usually formalize the problem as ranking the documents with greedy sequential document selection. At each of the ranking position the document that can provide the largest amount of additional information is selected. It is obvious that the greedy selections inevitably produce suboptimal rankings. In this paper we propose to partially alleviate the problem with a Monte Carlo tree search (MCTS) enhanced Markov decision process (MDP), referred to as M$^2$Div. In M$^2$Div, the construction of diverse ranking is formalized as an MDP process where each action corresponds to selecting a document for one ranking position. Given an MDP state which consists of the query, selected documents, and candidates, a recurrent neural network is utilized to produce the policy function for guiding the document selection and the value function for predicting the whole ranking quality. The produced raw policy and value are then strengthened with MCTS through exploring the possible rankings at the subsequent positions, achieving a better search policy for decision-making. Experimental results based on the TREC benchmarks showed that M$^2$Div can significantly outperform the state-of-the-art baselines based on greedy sequential document selection, indicating the effectiveness of the exploratory decision-making mechanism in M$^2$Div.
What problem does this paper attempt to address?