Abstract:Reinforcement learning (RL) has been widely applied in recommendation systems due to its potential in optimizing the long-term engagement of users. From the perspective of RL, recommendation can be formulated as a Markov decision process (MDP), where recommendation system (agent) can interact with users (environment) and acquire feedback (reward signals).However, it is impractical to conduct online interactions with the concern on user experience and implementation complexity, and we can only train RL recommenders with offline datasets containing limited reward signals and state transitions. Therefore, the data sparsity issue of reward signals and state transitions is very severe, while it has long been overlooked by existing RL <a class="link-external link-http" href="http://recommenders.Worse" rel="external noopener nofollow">this http URL</a> still, RL methods learn through the trial-and-error mode, but negative feedback cannot be obtained in implicit feedback recommendation tasks, which aggravates the overestimation problem of offline RL recommender. To address these challenges, we propose a novel RL recommender named model-enhanced contrastive reinforcement learning (MCRL). On the one hand, we learn a value function to estimate the long-term engagement of users, together with a conservative value learning mechanism to alleviate the overestimation <a class="link-external link-http" href="http://problem.On" rel="external noopener nofollow">this http URL</a> the other hand, we construct some positive and negative state-action pairs to model the reward function and state transition function with contrastive learning to exploit the internal structure information of MDP. Experiments demonstrate that the proposed method significantly outperforms existing offline RL and self-supervised RL methods with different representative backbone networks on two real-world datasets.

Hierarchical reinforcement learning with dynamic recurrent mechanism for course recommendation

Hierarchical Reinforcement Learning for Course Recommendation in MOOCs.

Efficient Deep Reinforcement Learning-Enabled Recommendation

Multi-scale reinforced profile for personalized recommendation with deep neural networks in MOOCs

Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation

Teach and Explore: A Multiplex Information-guided Effective and Efficient Reinforcement Learning for Sequential Recommendation

Hierarchical Reinforcement Learning for Modeling User Novelty-Seeking Intent in Recommender Systems

Generative Adversarial User Model for Reinforcement Learning Based Recommendation System

Model-enhanced Contrastive Reinforcement Learning for Sequential Recommendation

Method of personalized educational resource recommendation based on LDA and learner’s behavior

A Reinforcement Learning Approach to Personalized Learning Recommendation Systems

DRprofiling: Deep Reinforcement User Profiling for Recommendations in Heterogenous Information Networks

Deep Hierarchical Reinforcement Learning Based Recommendations via Multi-goals Abstraction

A Deep Reinforcement Learning Real-Time Recommendation Model Based on Long and Short-Term Preference

Deep Reinforcement Learning for List-wise Recommendations

MHRR: MOOCs Recommender Service With Meta Hierarchical Reinforced Ranking

An adaptable and personalized framework for top-N course recommendations in online learning

KERL: A Knowledge-Guided Reinforcement Learning Model for Sequential Recommendation

Deep Reinforcement Learning for Personalized Search Story Recommendation

Reinforcement Learning with a Disentangled Universal Value Function for Item Recommendation

Adaptive Learning Recommendation Strategy Based on Deep Q-learning