Abstract:Recommender selects and presents top-K items to the user at each online request, and a recommendation session consists of several sequential requests. Formulating a recommendation session as a Markov decision process and solving it by reinforcement learning (RL) framework has attracted increasing attention from both academic and industry communities. In this paper, we propose a RL-based industrial short-video recommender ranking framework, which models and maximizes user watch-time in an environment of user multi-aspect preferences by a collaborative multi-agent formulization. Moreover, our proposed framework adopts a model-based learning approach to alleviate the sample selection bias which is a crucial but intractable problem in industrial recommender system. Extensive offline evaluations and live experiments confirm the effectiveness of our proposed method over alternatives. Our proposed approach has been deployed in our real large-scale short-video sharing platform, successfully serving over hundreds of millions users.

What problem does this paper attempt to address?

This paper proposes a model-based multi-agent short video recommendation system that aims to maximize user viewing time and explicit interaction in an environment with multiple user preferences. During each online request, the recommendation system selects and displays the Top-K videos to the user, and a series of consecutive requests form a recommendation session. In the paper, the authors model the recommendation session as a Markov Decision Process and use a reinforcement learning framework to solve it. For short video recommendation, the paper focuses on long-term rewards such as viewing time and the cumulative satisfaction of each recommendation action. Although previous work has attempted to balance viewing time and explicit interaction, not all interactions are in competition with viewing time. Therefore, the paper proposes a multi-agent collaboration framework where multiple agents work together to optimize different user preferences more effectively, maximizing the accumulated viewing time in the session. In addition, the paper addresses the problem of sample selection bias, which is an important but challenging problem in industrial recommendation systems. By introducing non-impression samples and a feedback fitting model, the model can simulate user feedback and alleviate this problem. The model adopts an attention mechanism to select useful information from other auxiliary preference signals to support better action planning. The experimental part includes offline and online evaluations, demonstrating that the proposed method outperforms other alternative methods on public benchmark datasets and large-scale industrial datasets. It has been successfully applied to real-world large-scale short video sharing platforms, serving hundreds of millions of users. Online A/B tests further validate the performance improvements of the proposed method in metrics such as viewing time, depth, engagement, and comments.

A Model-based Multi-Agent Personalized Short-Video Recommender System

Constrained Reinforcement Learning for Short Video Recommendation

A Deep Reinforcement Learning Real-Time Recommendation Model Based on Long and Short-Term Preference

Multi-Task Fusion via Reinforcement Learning for Long-Term User Satisfaction in Recommender Systems

Two-Stage Constrained Actor-Critic for Short Video Recommendation

Reinforcing User Retention in a Billion Scale Short Video Recommender System

Deep Reinforcement Learning for List-wise Recommendations

Model-enhanced Contrastive Reinforcement Learning for Sequential Recommendation

Adaptive User Modeling with Long and Short-Term Preferences for Personalized Recommendation.

Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems

Digital Human Interactive Recommendation Decision-Making Based on Reinforcement Learning

Building Effective Short Video Recommendation

Short Video-based Advertisements Evaluation System: Self-Organizing Learning Approach

A Unified Personalized Video Recommendation via Dynamic Recurrent Neural Networks.

Generative Adversarial User Model for Reinforcement Learning Based Recommendation System

Real-time Short Video Recommendation on Mobile Devices

A stable deep reinforcement learning framework for recommendation

Learning and Optimization of Implicit Negative Feedback for Industrial Short-video Recommender System

Incremental Learning for Personalized Recommender Systems

Whole-Chain Recommendations

Personalized real-time movie recommendation system: Practical prototype and evaluation