ROLeR: Effective Reward Shaping in Offline Reinforcement Learning for Recommender Systems

Yi Zhang,Ruihong Qiu,Jiajun Liu,Sen Wang

2024-07-18

Abstract:Offline reinforcement learning (RL) is an effective tool for real-world recommender systems with its capacity to model the dynamic interest of users and its interactive nature. Most existing offline RL recommender systems focus on model-based RL through learning a world model from offline data and building the recommendation policy by interacting with this model. Although these methods have made progress in the recommendation performance, the effectiveness of model-based offline RL methods is often constrained by the accuracy of the estimation of the reward model and the model uncertainties, primarily due to the extreme discrepancy between offline logged data and real-world data in user interactions with online platforms. To fill this gap, a more accurate reward model and uncertainty estimation are needed for the model-based RL methods. In this paper, a novel model-based Reward Shaping in Offline Reinforcement Learning for Recommender Systems, ROLeR, is proposed for reward and uncertainty estimation in recommendation systems. Specifically, a non-parametric reward shaping method is designed to refine the reward model. In addition, a flexible and more representative uncertainty penalty is designed to fit the needs of recommendation systems. Extensive experiments conducted on four benchmark datasets showcase that ROLeR achieves state-of-the-art performance compared with existing baselines. The source code can be downloaded at <a class="link-external link-https" href="https://github.com/ArronDZhang/ROLeR" rel="external noopener nofollow">this https URL</a>.

Information Retrieval,Artificial Intelligence

What problem does this paper attempt to address?

The paper aims to address issues in offline reinforcement learning (ORL) methods within recommendation systems, specifically in reward function estimation and uncertainty estimation. Specifically: 1. **Inaccurate Reward Function Estimation**: Existing model-based offline reinforcement learning recommendation system methods rely on world models learned from offline data to construct recommendation policies. However, the effectiveness of these methods is often limited by the accuracy of reward model estimation and model uncertainty, especially due to the significant differences between offline log data and real-world data. 2. **Model Uncertainty**: Due to the significant differences between offline data and real user online interaction data, model-based methods are not precise enough in estimating rewards, thereby affecting recommendation performance. To address the above issues, the authors propose a new method—**ROLeR** (Reward Shaping in Offline Reinforcement Learning). This method improves the reward function through non-parametric reward shaping techniques and designs a more flexible and representative uncertainty penalty mechanism to meet the needs of recommendation systems. Experimental results show that ROLeR outperforms existing baseline methods on 4 benchmark datasets.

ROLeR: Effective Reward Shaping in Offline Reinforcement Learning for Recommender Systems

Beyond Reward: Offline Preference-guided Policy Optimization

A Rank-Based Sampling Framework for Offline Reinforcement Learning

Offline Deep Reinforcement Learning Two-stage Optimization Framework Applied to Recommendation Systems

On the Opportunities and Challenges of Offline Reinforcement Learning for Recommender Systems

Alleviating Matthew Effect of Offline Reinforcement Learning in Interactive Recommendation

Rethinking Reinforcement Learning for Recommendation: A Prompt Perspective

Reward Shaping for User Satisfaction in a REINFORCE Recommender

Value Penalized Q-Learning for Recommender Systems

Offline Adaptive Policy Leaning in Real-World Sequential Recommendation Systems

Model-enhanced Contrastive Reinforcement Learning for Sequential Recommendation

Reward-free Offline Reinforcement Learning

Efficient Online Reinforcement Learning with Offline Data

Rethinking Offline Reinforcement Learning for Sequential Recommendation from A Pair-Wise Q-Learning Perspective

A stable deep reinforcement learning framework for recommendation

Model-Based Offline Policy Optimization with Distribution Correcting Regularization.

Intrinsically Motivated Reinforcement Learning Based Recommendation with Counterfactual Data Augmentation

Listwise Reward Estimation for Offline Preference-based Reinforcement Learning

Robust Reinforcement Learning Objectives for Sequential Recommender Systems

A General Offline Reinforcement Learning Framework for Interactive Recommendation