A Model-based Multi-Agent Personalized Short-Video Recommender System

Peilun Zhou,Xiaoxiao Xu,Lantao Hu,Han Li,Peng Jiang
2024-05-03
Abstract:Recommender selects and presents top-K items to the user at each online request, and a recommendation session consists of several sequential requests. Formulating a recommendation session as a Markov decision process and solving it by reinforcement learning (RL) framework has attracted increasing attention from both academic and industry communities. In this paper, we propose a RL-based industrial short-video recommender ranking framework, which models and maximizes user watch-time in an environment of user multi-aspect preferences by a collaborative multi-agent formulization. Moreover, our proposed framework adopts a model-based learning approach to alleviate the sample selection bias which is a crucial but intractable problem in industrial recommender system. Extensive offline evaluations and live experiments confirm the effectiveness of our proposed method over alternatives. Our proposed approach has been deployed in our real large-scale short-video sharing platform, successfully serving over hundreds of millions users.
Information Retrieval,Artificial Intelligence
What problem does this paper attempt to address?
This paper proposes a model-based multi-agent short video recommendation system that aims to maximize user viewing time and explicit interaction in an environment with multiple user preferences. During each online request, the recommendation system selects and displays the Top-K videos to the user, and a series of consecutive requests form a recommendation session. In the paper, the authors model the recommendation session as a Markov Decision Process and use a reinforcement learning framework to solve it. For short video recommendation, the paper focuses on long-term rewards such as viewing time and the cumulative satisfaction of each recommendation action. Although previous work has attempted to balance viewing time and explicit interaction, not all interactions are in competition with viewing time. Therefore, the paper proposes a multi-agent collaboration framework where multiple agents work together to optimize different user preferences more effectively, maximizing the accumulated viewing time in the session. In addition, the paper addresses the problem of sample selection bias, which is an important but challenging problem in industrial recommendation systems. By introducing non-impression samples and a feedback fitting model, the model can simulate user feedback and alleviate this problem. The model adopts an attention mechanism to select useful information from other auxiliary preference signals to support better action planning. The experimental part includes offline and online evaluations, demonstrating that the proposed method outperforms other alternative methods on public benchmark datasets and large-scale industrial datasets. It has been successfully applied to real-world large-scale short video sharing platforms, serving hundreds of millions of users. Online A/B tests further validate the performance improvements of the proposed method in metrics such as viewing time, depth, engagement, and comments.