Abstract:Modern recommender systems aim to improve user experience. As reinforcement learning (RL) naturally fits this objective -- maximizing an user's reward per session -- it has become an emerging topic in recommender systems. Developing RL-based recommendation methods, however, is not trivial due to the \emph{offline training challenge}. Specifically, the keystone of traditional RL is to train an agent with large amounts of online exploration making lots of `errors' in the process. In the recommendation setting, though, we cannot afford the price of making `errors' online. As a result, the agent needs to be trained through offline historical implicit feedback, collected under different recommendation policies; traditional RL algorithms may lead to sub-optimal policies under these offline training settings. Here we propose a new learning paradigm -- namely Prompt-Based Reinforcement Learning (PRL) -- for the offline training of RL-based recommendation agents. While traditional RL algorithms attempt to map state-action input pairs to their expected rewards (e.g., Q-values), PRL directly infers actions (i.e., recommended items) from state-reward inputs. In short, the agents are trained to predict a recommended item given the prior interactions and an observed reward value -- with simple supervised learning. At deployment time, this historical (training) data acts as a knowledge base, while the state-reward pairs are used as a prompt. The agents are thus used to answer the question: \emph{ Which item should be recommended given the prior interactions \& the prompted reward value}? We implement PRL with four notable recommendation models and conduct experiments on two real-world e-commerce datasets. Experimental results demonstrate the superior performance of our proposed methods.

Offline Evaluation for Reinforcement Learning-based Recommendation: A Critical Issue and Some Alternatives

Offline recommender system evaluation: Challenges and new directions

Offline Evaluation of Reward-Optimizing Recommender Systems: The Case of Simulation

On the Opportunities and Challenges of Offline Reinforcement Learning for Recommender Systems

Widespread Flaws in Offline Evaluation of Recommender Systems

On Offline Evaluation of Recommender Systems.

Offline Deep Reinforcement Learning Two-stage Optimization Framework Applied to Recommendation Systems

Estimating Error and Bias in Offline Evaluation Results

Where Do We Go From Here? Guidelines For Offline Recommender Evaluation

Rethinking Reinforcement Learning for Recommendation: A Prompt Perspective

ROLeR: Effective Reward Shaping in Offline Reinforcement Learning for Recommender Systems

A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems

Alleviating Matthew Effect of Offline Reinforcement Learning in Interactive Recommendation

Rethinking Offline Reinforcement Learning for Sequential Recommendation from A Pair-Wise Q-Learning Perspective

Efficient Offline Reinforcement Learning: The Critic is Critical

Bridging Offline-Online Evaluation with a Time-dependent and Popularity Bias-free Offline Metric for Recommenders

A General Offline Reinforcement Learning Framework for Interactive Recommendation

A Critical Study on Data Leakage in Recommender System Offline Evaluation

The Simpson's Paradox in the Offline Evaluation of Recommendation Systems

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems