ROLeR: Effective Reward Shaping in Offline Reinforcement Learning for Recommender Systems

Yi Zhang,Ruihong Qiu,Jiajun Liu,Sen Wang
2024-07-18
Abstract:Offline reinforcement learning (RL) is an effective tool for real-world recommender systems with its capacity to model the dynamic interest of users and its interactive nature. Most existing offline RL recommender systems focus on model-based RL through learning a world model from offline data and building the recommendation policy by interacting with this model. Although these methods have made progress in the recommendation performance, the effectiveness of model-based offline RL methods is often constrained by the accuracy of the estimation of the reward model and the model uncertainties, primarily due to the extreme discrepancy between offline logged data and real-world data in user interactions with online platforms. To fill this gap, a more accurate reward model and uncertainty estimation are needed for the model-based RL methods. In this paper, a novel model-based Reward Shaping in Offline Reinforcement Learning for Recommender Systems, ROLeR, is proposed for reward and uncertainty estimation in recommendation systems. Specifically, a non-parametric reward shaping method is designed to refine the reward model. In addition, a flexible and more representative uncertainty penalty is designed to fit the needs of recommendation systems. Extensive experiments conducted on four benchmark datasets showcase that ROLeR achieves state-of-the-art performance compared with existing baselines. The source code can be downloaded at <a class="link-external link-https" href="https://github.com/ArronDZhang/ROLeR" rel="external noopener nofollow">this https URL</a>.
Information Retrieval,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address issues in offline reinforcement learning (ORL) methods within recommendation systems, specifically in reward function estimation and uncertainty estimation. Specifically: 1. **Inaccurate Reward Function Estimation**: Existing model-based offline reinforcement learning recommendation system methods rely on world models learned from offline data to construct recommendation policies. However, the effectiveness of these methods is often limited by the accuracy of reward model estimation and model uncertainty, especially due to the significant differences between offline log data and real-world data. 2. **Model Uncertainty**: Due to the significant differences between offline data and real user online interaction data, model-based methods are not precise enough in estimating rewards, thereby affecting recommendation performance. To address the above issues, the authors propose a new method—**ROLeR** (Reward Shaping in Offline Reinforcement Learning). This method improves the reward function through non-parametric reward shaping techniques and designs a more flexible and representative uncertainty penalty mechanism to meet the needs of recommendation systems. Experimental results show that ROLeR outperforms existing baseline methods on 4 benchmark datasets.