Abstract:Planning for both immediate and long-term benefits becomes increasingly important in recommendation. Existing methods apply Reinforcement Learning (RL) to learn planning capacity by maximizing cumulative reward for long-term recommendation. However, the scarcity of recommendation data presents challenges such as instability and susceptibility to overfitting when training RL models from scratch, resulting in sub-optimal performance. In this light, we propose to leverage the remarkable planning capabilities over sparse data of Large Language Models (LLMs) for long-term recommendation. The key to achieving the target lies in formulating a guidance plan following principles of enhancing long-term engagement and grounding the plan to effective and executable actions in a personalized manner. To this end, we propose a Bi-level Learnable LLM Planner framework, which consists of a set of LLM instances and breaks down the learning process into macro-learning and micro-learning to learn macro-level guidance and micro-level personalized recommendation policies, respectively. Extensive experiments validate that the framework facilitates the planning ability of LLMs for long-term recommendation. Our code and data can be found at

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to address the issue of long-term planning in recommendation systems. Specifically, existing recommendation methods primarily focus on optimizing users' immediate responses by maximizing short-term gains (e.g., click-through rate). However, this greedy recommendation strategy often neglects users' long-term engagement and may lead to negative consequences such as the filter bubble effect. Therefore, researchers believe that integrating planning capabilities into the recommendation decision-making process is crucial to develop strategies that consider both immediate benefits and long-term impacts. However, existing methods typically use Reinforcement Learning (RL) to train models from scratch to acquire planning capabilities. This approach faces issues such as instability and overfitting when dealing with sparse recommendation data, resulting in poor performance. To address these problems, this paper proposes leveraging the planning capabilities of Large Language Models (LLMs) to improve long-term recommendations. ### Main Contributions 1. **Introduced a dual-level planning scheme utilizing LLMs' planning capabilities**: Enhanced the long-term engagement of recommendation systems through a dual-level planning framework. 2. **Proposed a new Bi-level Learnable LLM Planner (BiLLP) framework**: This framework includes four modules that can learn planning capabilities at both macro and micro levels and improve performance through low-variance Q-value estimation. 3. **Conducted extensive experimental validation**: Experiments demonstrated the planning capabilities of LLMs in long-term recommendations and the superiority of the BiLLP framework. ### Method Overview 1. **Macro-learning**: - **Planner**: Generates high-level planning schemes and decomposes them into sub-plans to guide subsequent recommendation actions. - **Reflector**: Extracts high-level guiding principles from historical interaction records and feeds them back to the planner to improve planning quality. 2. **Micro-learning**: - **Actor**: Transforms high-level planning schemes into specific recommendation actions, combining micro-level experiences for personalized recommendations. - **Critic**: Evaluates the current level of user satisfaction (i.e., the advantage value of actions) and updates the actor's strategy to enhance personalized recommendation effectiveness. ### Experimental Results Experimental results show that the BiLLP framework can effectively utilize the planning capabilities of LLMs, significantly enhancing the long-term engagement of recommendation systems. Compared to traditional RL methods, BiLLP demonstrates better stability and performance when dealing with sparse recommendation data. ### Conclusion By proposing the BiLLP framework, this paper successfully applies the planning capabilities of LLMs to recommendation systems, addressing the shortcomings of existing methods in handling sparse data and providing new insights for optimizing long-term engagement in recommendation systems.

Large Language Models are Learnable Planners for Long-Term Recommendation

Aligning Large Language Models for Controllable Recommendations

Large Language Models for Recommendation: Past, Present, and Future

Lifelong Personalized Low-Rank Adaptation of Large Language Models for Recommendation

Towards Reliable and Efficient Long-Term Recommendation with Large Foundation Models

Large Language Models for Recommendation: Progresses and Future Directions

LLaRA: Aligning Large Language Models with Sequential Recommenders.

Recommendation as Instruction Following: A Large Language Model Empowered Recommendation Approach

FLTRNN: Faithful Long-Horizon Task Planning for Robotics with Large Language Models

RecMind: Large Language Model Powered Agent For Recommendation

Representation Learning with Large Language Models for Recommendation

Aligning Large Language Models with Recommendation Knowledge

A survey on large language models for recommendation

Large Language Models Make Sample-Efficient Recommender Systems

LLaRA: Large Language-Recommendation Assistant

Adaptive Reinforcement Learning Planning: Harnessing Large Language Models for Complex Information Extraction

HLLM: Enhancing Sequential Recommendations via Hierarchical Large Language Models for Item and User Modeling

LlamaRec: Two-Stage Recommendation using Large Language Models for Ranking

ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation

Make Large Language Model a Better Ranker