Leveraging LLM Reasoning Enhances Personalized Recommender Systems

Alicia Y. Tsai,Adam Kraft,Long Jin,Chenwei Cai,Anahita Hosseini,Taibai Xu,Zemin Zhang,Lichan Hong,Ed H. Chi,Xinyang Yi
2024-07-23
Abstract:Recent advancements have showcased the potential of Large Language Models (LLMs) in executing reasoning tasks, particularly facilitated by Chain-of-Thought (CoT) prompting. While tasks like arithmetic reasoning involve clear, definitive answers and logical chains of thought, the application of LLM reasoning in recommendation systems (RecSys) presents a distinct challenge. RecSys tasks revolve around subjectivity and personalized preferences, an under-explored domain in utilizing LLMs' reasoning capabilities. Our study explores several aspects to better understand reasoning for RecSys and demonstrate how task quality improves by utilizing LLM reasoning in both zero-shot and finetuning settings. Additionally, we propose RecSAVER (Recommender Systems Automatic Verification and Evaluation of Reasoning) to automatically assess the quality of LLM reasoning responses without the requirement of curated gold references or human raters. We show that our framework aligns with real human judgment on the coherence and faithfulness of reasoning responses. Overall, our work shows that incorporating reasoning into RecSys can improve personalized tasks, paving the way for further advancements in recommender system methodologies.
Information Retrieval,Artificial Intelligence,Computation and Language,Machine Learning
What problem does this paper attempt to address?
The paper primarily explores how to leverage the reasoning capabilities of large language models (LLMs) to enhance the performance of personalized recommender systems (RecSys). Specifically, the research focuses on the following aspects: 1. **Research Background and Motivation**: Although LLMs have shown great potential in performing logical reasoning tasks, especially under the guidance of Chain-of-Thought (CoT), their application in the field of recommender systems, particularly in handling subjectivity and personalized preferences, remains an underexplored area. 2. **Research Objectives**: The paper aims to explore how to utilize the reasoning capabilities of LLMs to improve the performance of personalized tasks in recommender systems and proposes an automatic method for evaluating reasoning quality—the Rec-SAVER framework, which does not require pre-prepared gold standard references or human evaluators. 3. **Methodology**: - **Reasoning in Zero-Shot Learning**: Guiding LLMs to generate reasoning responses through zero-shot CoT prompting strategies and making predictions based on user history and item metadata. - **Fine-Tuning with Reasoning**: Collecting reasoning outputs generated by large language models as training data to fine-tune smaller pre-trained models to enhance their task performance. 4. **Evaluation Method**: The Rec-SAVER framework is proposed to automatically evaluate the quality of reasoning responses generated by LLMs. This framework first generates a series of posterior explanations based on user history and recommended items, then filters out high-quality explanations through a self-verification process to serve as reference standards, thereby quantifying the quality of reasoning responses generated by other LLMs. 5. **Experimental Results**: Experiments on the Amazon product review dataset demonstrate that introducing reasoning steps significantly improves the performance of recommender system tasks in both zero-shot learning and fine-tuning with reasoning scenarios. Additionally, comparisons of different model sizes and analyses of the effects of different filtering methods were conducted. In summary, the main contribution of this paper lies in demonstrating how to effectively utilize the reasoning capabilities of LLMs to improve the personalized recommendation performance of recommender systems and proposing a new method for automatically evaluating reasoning quality.