Abstract:The conventional solution to learning to rank problems ranks individual documents by prediction scores greedily. Recent emerged re-ranking models, which take as input initial lists, aim to capture document interdependencies and directly generate the optimal ordered lists. Typically, a re-ranking model is learned from a set of labeled data, which can achieve favorable performance on average. However, it can be suboptimal for individual queries because the available training data is usually highly imbalanced. This problem is challenging due to the absence of informative data for some queries and furthermore, the lack of a good data augmentation policy. In this paper, we propose a novel method named Learning to Augment (LTA), which mitigates the imbalance issue through learning to augment the initial lists for re-ranking models. Specifically, we first design a data generation model based on Gaussian Mixture Variational Autoencoder (GMVAE) for generating informative data. GMVAE imposes a mixture of Gaussians on the latent space, which allows it to cluster queries in an unsupervised manner and then generate new data with different query types using the learned components. Then, to obtain a good augmentation strategy (instead of heuristics), we design a teacher model that consists of two intelligent agents to determine how to generate new data for a given list and how to rank both the raw data and generated data to produce augmented lists, respectively. The teacher model leverages the feedback from the re-ranking model to optimize its augmentation policy by means of reinforcement learning. Our method offers a general learning paradigm that is applicable to both supervised and reinforced re-ranking models. Experimental results on benchmark learning to rank datasets show that our proposed method can significantly improve the performance of re-ranking models.

Reinforcement Learning to Rank with Pairwise Policy Gradient

Policy-Gradient Training of Language Models for Ranking

Reinforcement Learning to Rank with Markov Decision Process

Multi Page Search with Reinforcement Learning to Rank

RLPS: A Reinforcement Learning–Based Framework for Personalized Search

Reinforcement Learning to Rank in E-Commerce Search Engine: Formalization, Analysis, and Application

Online Learning to Rank in a Listwise Approach for Information Retrieval

Mixed Policy Gradient: off-policy reinforcement learning driven jointly by data and model

Towards Off-Policy Reinforcement Learning for Ranking Policies with Human Feedback

Learning to Collaborate: Multi-Scenario Ranking Via Multi-Agent Reinforcement Learning.

Pareto Pairwise Ranking for Fairness Enhancement of Recommender Systems

SetRank: Learning a Permutation-Invariant Ranking Model for Information Retrieval

Sequential Search with Off-Policy Reinforcement Learning

Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement Learning with Provable Convergence

Adapting Markov Decision Process for Search Result Diversification

Query-Policy Misalignment in Preference-Based Reinforcement Learning

Few-shot Prompting for Pairwise Ranking: An Effective Non-Parametric Retrieval Model

Learning to Augment Imbalanced Data for Re-ranking Models.

PolicyBoost: Functional Policy Gradient with Ranking-based Reward Objective

A Deep Recurrent Survival Model for Unbiased Ranking

Beyond Probability Ranking Principle: Modeling the Dependencies among Documents