Abstract:One of the central issues in learning to rank for information retrieval is to develop algorithms that construct ranking models by directly optimizing evaluation measures such as normalized discounted cumulative gain~(ND CG). Existing methods usually focus on optimizing a specific evaluation measure calculated at a fixed position, e.g., NDCG calculated at a fixed position K. In information retrieval the evaluation measures, including the widely used NDCG and P@K, are usually designed to evaluate the document ranking at all of the ranking positions, which provide much richer information than only measuring the document ranking at a single position. Thus, it is interesting to ask if we can devise an algorithm that has the ability of leveraging the measures calculated at all of the ranking postilions, for learning a better ranking model. In this paper, we propose a novel learning to rank model on the basis of Markov decision process (MDP), referred to as MDPRank. In the learning phase of MDPRank, the construction of a document ranking is considered as a sequential decision making, each corresponds to an action of selecting a document for the corresponding position. The policy gradient algorithm of REINFORCE is adopted to train the model parameters. The evaluation measures calculated at every ranking positions are utilized as the immediate rewards to the corresponding actions, which guide the learning algorithm to adjust the model parameters so that the measure is optimized. Experimental results on LETOR benchmark datasets showed that MDPRank can outperform the state-of-the-art baselines.

POMDP-Based Ranking and Selection.

Ranking and Selection as Stochastic Control

AlphaRank: An Artificial Intelligence Approach for Ranking and Selection Problems

Efficient Dynamic Allocation Policy for Robust Ranking and Selection under Stochastic Control Framework

A Rank-Based Sampling Framework for Offline Reinforcement Learning

Sequential Sampling for Bayesian Robust Ranking and Selection

Ranking and Selection with Two-Stage Decision

Non-Myopic Knowledge Gradient Policy for Ranking and Selection.

Sequential sampling for a ranking and selection problem with exponential sampling distributions

Data-driven Ranking and Selection under Input Uncertainty

Choosing to Rank

Inverse Reinforcement Learning with Multiple Ranked Experts

Bi-objective Ranking and Selection Using Stochastic Kriging

Optimal Computing Budget Allocation for Data-driven Ranking and Selection

Stochastic Robustness Controller Design upon Different Ranking Criteria

An Indifference-Zone Selection Procedure with Minimum Switching and Sequential Sampling

Selecting the Best System when Systems Are Revealed Sequentially

Context-Dependent Ranking and Selection under a Bayesian Framework

Estimation and Control Using Sampling-Based Bayesian Reinforcement Learning

A ranking-system-based switching particle swarm optimizer with dynamic learning strategies

Reinforcement Learning to Rank with Markov Decision Process