Retentive Decision Transformer with Adaptive Masking for Reinforcement Learning based Recommendation Systems

Siyu Wang,Xiaocong Chen,Lina Yao
2024-03-26
Abstract:Reinforcement Learning-based Recommender Systems (RLRS) have shown promise across a spectrum of applications, from e-commerce platforms to streaming services. Yet, they grapple with challenges, notably in crafting reward functions and harnessing large pre-existing datasets within the RL framework. Recent advancements in offline RLRS provide a solution for how to address these two challenges. However, existing methods mainly rely on the transformer architecture, which, as sequence lengths increase, can introduce challenges associated with computational resources and training costs. Additionally, the prevalent methods employ fixed-length input trajectories, restricting their capacity to capture evolving user preferences. In this study, we introduce a new offline RLRS method to deal with the above problems. We reinterpret the RLRS challenge by modeling sequential decision-making as an inference task, leveraging adaptive masking configurations. This adaptive approach selectively masks input tokens, transforming the recommendation task into an inference challenge based on varying token subsets, thereby enhancing the agent's ability to infer across diverse trajectory lengths. Furthermore, we incorporate a multi-scale segmented retention mechanism that facilitates efficient modeling of long sequences, significantly enhancing computational efficiency. Our experimental analysis, conducted on both online simulator and offline datasets, clearly demonstrates the advantages of our proposed method.
Information Retrieval,Machine Learning
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address several key challenges in Reinforcement Learning (RL) Recommender Systems (RS): 1. **Reward Function Design and Utilization of Large Datasets**: Traditional RL recommender systems face difficulties in designing reward functions and effectively utilizing large amounts of historical data. Recent studies have proposed some offline RL recommender system solutions, but these methods mainly rely on the Transformer architecture, which leads to increased computational resources and training costs as the sequence length increases. 2. **Limitations of Fixed-Length Input Trajectories**: Existing methods typically use fixed-length input trajectories, which limits their ability to capture the evolution of user preferences over time. Therefore, there is a need for a method that can handle trajectories of different lengths to better adapt to the dynamic changes in user interests. 3. **Computational Efficiency and Long Sequence Modeling**: When applying Transformer-based offline RL in recommender systems, the complexity increases with the sequence length, resulting in significant increases in memory usage, latency, and training costs. To address this challenge, the paper introduces a new framework—Mask Retention Decision Transformer (MaskRDT) with an adaptive masking mechanism, which efficiently handles long sequences through a multi-scale segmented retention mechanism while reducing training costs. In summary, the main goal of the paper is to design a new method that can efficiently handle user trajectories of different lengths and achieve a balance between computational efficiency and model performance.