Large Language Models are Learnable Planners for Long-Term Recommendation

Wentao Shi,Xiangnan He,Yang Zhang,Chongming Gao,Xinyue Li,Jizhi Zhang,Qifan Wang,Fuli Feng
DOI: https://doi.org/10.1145/3626772.3657683
2024-04-26
Abstract:Planning for both immediate and long-term benefits becomes increasingly important in recommendation. Existing methods apply Reinforcement Learning (RL) to learn planning capacity by maximizing cumulative reward for long-term recommendation. However, the scarcity of recommendation data presents challenges such as instability and susceptibility to overfitting when training RL models from scratch, resulting in sub-optimal performance. In this light, we propose to leverage the remarkable planning capabilities over sparse data of Large Language Models (LLMs) for long-term recommendation. The key to achieving the target lies in formulating a guidance plan following principles of enhancing long-term engagement and grounding the plan to effective and executable actions in a personalized manner. To this end, we propose a Bi-level Learnable LLM Planner framework, which consists of a set of LLM instances and breaks down the learning process into macro-learning and micro-learning to learn macro-level guidance and micro-level personalized recommendation policies, respectively. Extensive experiments validate that the framework facilitates the planning ability of LLMs for long-term recommendation. Our code and data can be found at
Information Retrieval,Artificial Intelligence,Computation and Language,Machine Learning
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address the issue of long-term planning in recommendation systems. Specifically, existing recommendation methods primarily focus on optimizing users' immediate responses by maximizing short-term gains (e.g., click-through rate). However, this greedy recommendation strategy often neglects users' long-term engagement and may lead to negative consequences such as the filter bubble effect. Therefore, researchers believe that integrating planning capabilities into the recommendation decision-making process is crucial to develop strategies that consider both immediate benefits and long-term impacts. However, existing methods typically use Reinforcement Learning (RL) to train models from scratch to acquire planning capabilities. This approach faces issues such as instability and overfitting when dealing with sparse recommendation data, resulting in poor performance. To address these problems, this paper proposes leveraging the planning capabilities of Large Language Models (LLMs) to improve long-term recommendations. ### Main Contributions 1. **Introduced a dual-level planning scheme utilizing LLMs' planning capabilities**: Enhanced the long-term engagement of recommendation systems through a dual-level planning framework. 2. **Proposed a new Bi-level Learnable LLM Planner (BiLLP) framework**: This framework includes four modules that can learn planning capabilities at both macro and micro levels and improve performance through low-variance Q-value estimation. 3. **Conducted extensive experimental validation**: Experiments demonstrated the planning capabilities of LLMs in long-term recommendations and the superiority of the BiLLP framework. ### Method Overview 1. **Macro-learning**: - **Planner**: Generates high-level planning schemes and decomposes them into sub-plans to guide subsequent recommendation actions. - **Reflector**: Extracts high-level guiding principles from historical interaction records and feeds them back to the planner to improve planning quality. 2. **Micro-learning**: - **Actor**: Transforms high-level planning schemes into specific recommendation actions, combining micro-level experiences for personalized recommendations. - **Critic**: Evaluates the current level of user satisfaction (i.e., the advantage value of actions) and updates the actor's strategy to enhance personalized recommendation effectiveness. ### Experimental Results Experimental results show that the BiLLP framework can effectively utilize the planning capabilities of LLMs, significantly enhancing the long-term engagement of recommendation systems. Compared to traditional RL methods, BiLLP demonstrates better stability and performance when dealing with sparse recommendation data. ### Conclusion By proposing the BiLLP framework, this paper successfully applies the planning capabilities of LLMs to recommendation systems, addressing the shortcomings of existing methods in handling sparse data and providing new insights for optimizing long-term engagement in recommendation systems.