Abstract:Learning to rank from logged user feedback, such as clicks or purchases, is a central component of many real-world information systems. Different from human-annotated relevance labels, the user feedback is always noisy and biased. Many existing learning to rank methods infer the underlying relevance of query–item pairs based on different assumptions of examination, and still optimize a relevance based objective. Such methods rely heavily on the correct estimation of examination, which is often difficult to achieve in practice. In this work, we propose a general framework U-rank+ for learning to rank with logged user feedback from the perspective of graph matching. We systematically analyze the biases in user feedback, including examination bias and selection bias. Then, we take both biases into consideration for unbiased utility estimation that directly based on user feedback, instead of relevance. In order to maximize the estimated utility in an efficient manner, we design two different solvers based on Sinkhorn and LambdaLoss for U-rank+ . The former is based on a standard graph matching algorithm, and the latter is inspired by the traditional method of learning to rank. Both of the algorithms have good theoretical properties to optimize the unbiased utility objective while the latter is proved to be empirically more effective and efficient in practice. Our framework U-rank+ can deal with a general utility function and can be used in a widespread of applications including web search, recommendation, and online advertising. Semi-synthetic experiments on three benchmark learning to rank datasets demonstrate the effectiveness of U-rank+ . Furthermore, our proposed framework has been deployed on two different scenarios of a mainstream App store, where the online A/B testing shows that U-rank+ achieves an average improvement of 19.2% on click-through rate and 20.8% improvement on conversion rate in recommendation scenario, and 5.12% on platform revenue in online advertising scenario over the production baselines.

RLMixer: A Reinforcement Learning Approach for Integrated Ranking with Contrastive User Preference Modeling.

Multi-Task Fusion via Reinforcement Learning for Long-Term User Satisfaction in Recommender Systems

Beyond Positive History: Re-ranking with List-level Hybrid Feedback

RLRF4Rec: Reinforcement Learning from Recsys Feedback for Enhanced Recommendation Reranking

An Enhanced-State Reinforcement Learning Algorithm for Multi-Task Fusion in Large-Scale Recommender Systems

Multi-sourced Integrated Ranking with Exposure Fairness

Model-enhanced Contrastive Reinforcement Learning for Sequential Recommendation

Teach and Explore: A Multiplex Information-guided Effective and Efficient Reinforcement Learning for Sequential Recommendation

Controllable Multi-Objective Re-ranking with Policy Hypernetworks

Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems

Multi-Level Interaction Reranking with User Behavior History

PIER: Permutation-Level Interest-Based End-to-End Re-ranking Framework in E-commerce

LIRE: listwise reward enhancement for preference alignment

CDARL: a contrastive discriminator-augmented reinforcement learning framework for sequential recommendations

Beyond Relevance Ranking: A General Graph Matching Framework for Utility-Oriented Learning to Rank

PrefRec: Recommender Systems with Human Preferences for Reinforcing Long-term User Engagement

Learning to Rank Features for Recommendation over Multiple Categories

EDMF: Efficient Deep Matrix Factorization With Review Feature Learning for Industrial Recommender System

Personalized Re-ranking for Improving Diversity in Live Recommender Systems

Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective

MultiSlot ReRanker: A Generic Model-based Re-Ranking Framework in Recommendation Systems