RLMixer: A Reinforcement Learning Approach for Integrated Ranking with Contrastive User Preference Modeling.

Jing Wang,Mengchen Zhao,Wei Xia,Zhenhua Dong,Ruiming Tang,Rui Zhang,Jianye Hao,Guangyong Chen,Pheng-Ann Heng
DOI: https://doi.org/10.1007/978-3-031-33380-4_31
2023-01-01
Abstract:There is a strong need for industrial recommender systems to output an integrated ranking of items from different categories, such as video and news, to maximize overall user satisfaction. Integrated ranking faces two critical challenges. First, there is no universal metric to evaluate the contribution of each item due to the huge discrepancies between items. Second, user's short-term preference may shift fast between diverse items during her interaction with the recommender system. To address the above challenges, we propose a reinforcement learning (RL) based framework called RLMixer to approach the sequential integrated ranking problem. Benefiting from the credit assignment mechanism, RLMixer can decompose the overall user satisfaction to items of different categories, so that they are comparable. To capture the user's short-term preference, RLMixer explicitly learns user interest vectors by a carefully designed contrastive loss. In addition, RLMixer is trained in a fully offline manner for the convenience in industrial applications. We show that RLMixer significantly outperforms various baselines on both public PRM datasets and industrial datasets collected from a widely used AppStore. We also conduct online A/B tests on millions of users through the AppStore. The results show that RLMixer brings over 4% significant revenue gain.
What problem does this paper attempt to address?