Retentive Decision Transformer with Adaptive Masking for Reinforcement Learning based Recommendation Systems

Siyu Wang,Xiaocong Chen,Lina Yao

2024-03-26

Abstract:Reinforcement Learning-based Recommender Systems (RLRS) have shown promise across a spectrum of applications, from e-commerce platforms to streaming services. Yet, they grapple with challenges, notably in crafting reward functions and harnessing large pre-existing datasets within the RL framework. Recent advancements in offline RLRS provide a solution for how to address these two challenges. However, existing methods mainly rely on the transformer architecture, which, as sequence lengths increase, can introduce challenges associated with computational resources and training costs. Additionally, the prevalent methods employ fixed-length input trajectories, restricting their capacity to capture evolving user preferences. In this study, we introduce a new offline RLRS method to deal with the above problems. We reinterpret the RLRS challenge by modeling sequential decision-making as an inference task, leveraging adaptive masking configurations. This adaptive approach selectively masks input tokens, transforming the recommendation task into an inference challenge based on varying token subsets, thereby enhancing the agent's ability to infer across diverse trajectory lengths. Furthermore, we incorporate a multi-scale segmented retention mechanism that facilitates efficient modeling of long sequences, significantly enhancing computational efficiency. Our experimental analysis, conducted on both online simulator and offline datasets, clearly demonstrates the advantages of our proposed method.

Information Retrieval,Machine Learning

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address several key challenges in Reinforcement Learning (RL) Recommender Systems (RS): 1. **Reward Function Design and Utilization of Large Datasets**: Traditional RL recommender systems face difficulties in designing reward functions and effectively utilizing large amounts of historical data. Recent studies have proposed some offline RL recommender system solutions, but these methods mainly rely on the Transformer architecture, which leads to increased computational resources and training costs as the sequence length increases. 2. **Limitations of Fixed-Length Input Trajectories**: Existing methods typically use fixed-length input trajectories, which limits their ability to capture the evolution of user preferences over time. Therefore, there is a need for a method that can handle trajectories of different lengths to better adapt to the dynamic changes in user interests. 3. **Computational Efficiency and Long Sequence Modeling**: When applying Transformer-based offline RL in recommender systems, the complexity increases with the sequence length, resulting in significant increases in memory usage, latency, and training costs. To address this challenge, the paper introduces a new framework—Mask Retention Decision Transformer (MaskRDT) with an adaptive masking mechanism, which efficiently handles long sequences through a multi-scale segmented retention mechanism while reducing training costs. In summary, the main goal of the paper is to design a new method that can efficiently handle user trajectories of different lengths and achieve a balance between computational efficiency and model performance.

Retentive Decision Transformer with Adaptive Masking for Reinforcement Learning based Recommendation Systems

User Retention-oriented Recommendation with Decision Transformer.

Integrating Offline Reinforcement Learning with Transformers for Sequential Recommendation

Sequential Recommendation for Optimizing Both Immediate Feedback and Long-term Retention

Maximum-Entropy Regularized Decision Transformer with Reward Relabelling for Dynamic Recommendation

Model-enhanced Contrastive Reinforcement Learning for Sequential Recommendation

Decision Transformer: Reinforcement Learning via Sequence Modeling

Intrinsically Motivated Reinforcement Learning Based Recommendation with Counterfactual Data Augmentation

Mamba as Decision Maker: Exploring Multi-scale Sequence Modeling in Offline Reinforcement Learning

Robust Reinforcement Learning Objectives for Sequential Recommender Systems

Decision Mamba: Reinforcement Learning via Hybrid Selective Sequence Modeling

A Deep Reinforcement Learning Real-Time Recommendation Model Based on Long and Short-Term Preference

Rethinking Decision Transformer via Hierarchical Reinforcement Learning

Deep Reinforcement Learning for List-wise Recommendations

Sim-to-Real Interactive Recommendation via Off-Dynamics Reinforcement Learning

A stable deep reinforcement learning framework for recommendation

Non-Stationary Transformer Architecture: A Versatile Framework for Recommendation Systems

ROLeR: Effective Reward Shaping in Offline Reinforcement Learning for Recommender Systems

Teach and Explore: A Multiplex Information-guided Effective and Efficient Reinforcement Learning for Sequential Recommendation

Offline Adaptive Policy Leaning in Real-World Sequential Recommendation Systems

Optimizing Encoder-Only Transformers for Session-Based Recommendation Systems