Abstract: Long-term user engagement (LTE) optimization in sequential recommender systems (SRS) is shown to be suited by reinforcement learning (RL) which finds a policy to maximize long-term rewards. Meanwhile, RL has its shortcomings, particularly requiring a large number of online samples for exploration, which is risky in real-world applications. One of the appealing ways to avoid the risk is to build a simulator and learn the optimal recommendation policy in the simulator. In LTE optimization, the simulator is to simulate multiple users' daily feedback for given recommendations. However, building a user simulator with no reality-gap, i.e., can predict user's feedback exactly, is unrealistic because the users' reaction patterns are complex and historical logs for each user are limited, which might mislead the simulator-based recommendation policy. In this paper, we present a practical simulator-based recommender policy training approach, Simulation-to-Recommendation (Sim2Rec) to handle the reality-gap problem for LTE optimization. Specifically, Sim2Rec introduces a simulator set to generate various possibilities of user behavior patterns, then trains an environment-parameter extractor to recognize users' behavior patterns in the simulators. Finally, a context-aware policy is trained to make the optimal decisions on all of the variants of the users based on the inferred environment-parameters. The policy is transferable to unseen environments (e.g., the real world) directly as it has learned to recognize all various user behavior patterns and to make the correct decisions based on the inferred environment-parameters. Experiments are conducted in synthetic environments and a real-world large-scale ride-hailing platform, DidiChuxing. The results show that Sim2Rec achieves significant performance improvement, and produces robust recommendations in unseen environments.

UserSim: User Simulation via Supervised Generative Adversarial Network

Toward Simulating Environments in Reinforcement Learning Based Recommendations.

Generative Adversarial User Model for Reinforcement Learning Based Recommendation System

MINDSim: User Simulator for News Recommenders

On Generative Agents in Recommendation

KuaiSim: A Comprehensive Simulator for Recommender Systems

RecSim NG: Toward Principled Uncertainty Modeling for Recommender Ecosystems

Sim2Rec: A Simulator-based Decision-making Approach to Optimize Real-World Long-term User Engagement in Sequential Recommender Systems

Generative Session-based Recommendation

Session-based Interactive Recommendation Via Deep Reinforcement Learning

Testing Deep Learning Recommender Systems Models on Synthetic GAN-Generated Datasets

Staying or Leaving: A Knowledge-Enhanced User Simulator for Reinforcement Learning Based Short Video Recommendation.

DCFGAN: An adversarial deep reinforcement learning framework with improved negative sampling for session-based recommender systems

How Reliable is Your Simulator? Analysis on the Limitations of Current LLM-based User Simulators for Conversational Recommendation

A LLM-based Controllable, Scalable, Human-Involved User Simulator Framework for Conversational Recommender Systems

Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation

Recommender Systems Based on Generative Adversarial Networks: A Problem-Driven Perspective

UserSimCRS: A User Simulation Toolkit for Evaluating Conversational Recommender Systems

GAN-based Recommendation with Positive-Unlabeled Sampling

Generative Inverse Deep Reinforcement Learning for Online Recommendation

Learning Interactive Real-World Simulators