Improving Sequential Recommendations with LLMs

Artun Boz,Wouter Zorgdrager,Zoe Kotti,Jesse Harte,Panos Louridas,Dietmar Jannach,Marios Fragkoulis
2024-02-02
Abstract:The sequential recommendation problem has attracted considerable research attention in the past few years, leading to the rise of numerous recommendation models. In this work, we explore how Large Language Models (LLMs), which are nowadays introducing disruptive effects in many AI-based applications, can be used to build or improve sequential recommendation approaches. Specifically, we design three orthogonal approaches and hybrids of those to leverage the power of LLMs in different ways. In addition, we investigate the potential of each approach by focusing on its comprising technical aspects and determining an array of alternative choices for each one. We conduct extensive experiments on three datasets and explore a large variety of configurations, including different language models and baseline recommendation models, to obtain a comprehensive picture of the performance of each approach. Among other observations, we highlight that initializing state-of-the-art sequential recommendation models such as BERT4Rec or SASRec with embeddings obtained from an LLM can lead to substantial performance gains in terms of accuracy. Furthermore, we find that fine-tuning an LLM for recommendation tasks enables it to learn not only the tasks, but also concepts of a domain to some extent. We also show that fine-tuning OpenAI GPT leads to considerably better performance than fine-tuning Google PaLM 2. Overall, our extensive experiments indicate a huge potential value of leveraging LLMs in future recommendation approaches. We publicly share the code and data of our experiments to ensure reproducibility.
Information Retrieval
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to use large language models (LLMs) to improve sequence recommendation systems. Specifically, the authors explored the enhancement of sequence recommendation methods by LLMs in different ways and verified the effectiveness of these methods through experiments. ### Problem Background The sequence recommendation problem aims to predict the user's next interest or behavior based on the user's past interaction sequence. This problem has wide applications in many fields, such as next - purchase prediction in e - commerce, next - track recommendation in music recommendation, and next - spot recommendation in tourism. In recent years, with the rapid development of LLMs, the innovations of these models in the field of natural language processing have provided new ideas and methods for sequence recommendation. ### Main Contributions of the Paper 1. **Design and Research of Three Orthogonal Methods and Hybrid Methods**: - **LLMSeqSim**: Obtain semantically rich embeddings for each session item from the existing LLM, optionally reduce the embeddings to the target dimension, and then calculate the aggregated session embeddings to recommend products with similar embeddings. - **LLMSeqPrompt**: Fine - tune the LLM with specific dataset information (such as prompt - completion pairs), and ask the model to generate the next item recommendation for the test prompt. - **LLM2Sequential**: Enhance existing sequential models (such as BERT4Rec or SASRec) with item embeddings obtained from LLMs to improve their performance. 2. **Experimental Verification**: - Conduct extensive experiments on three datasets with different domains and characteristics, including the Amazon Beauty, Delivery Hero proprietary dataset, and Steam game dataset. - The experimental results show that using LLM embeddings can significantly improve the performance of different categories of sequence models. For example, LLM2SASRec and LLM2BERT4Rec on the Amazon Beauty dataset on average improve the NDCG@20 metric by 45%, and on the Delivery Hero dataset by 9%. 3. **Other Observations**: - The fine - tuned OpenAI GPT and Google PaLM models have a significant improvement in task performance, especially with an average improvement of 16% in the NDCG@20 metric, and the hallucination phenomenon is reduced by nearly half, and the semantic similarity is also higher. - For some datasets, such as Amazon Beauty, the semantic item recommendation model based on LLM embeddings (LLMSeqSim) ranks third and obtains the best MRR score; the fine - tuned model (LLMSeqPrompt) outperforms GRU4Rec and SASRec. ### Formula Representation When describing the above methods, some formulas and calculation steps involved are as follows: - **Embedding Dimensionality Reduction**: Suppose the original embedding dimension is \( d \) and the target dimension is \( k \), dimensionality reduction can be carried out by methods such as PCA, LDA, auto - encoder or random projection. For example, the PCA dimensionality reduction formula is: \[ X_{\text{reduced}} = XW \] where \( X \) is the original embedding matrix and \( W \) is the dimensionality reduction matrix obtained by the PCA algorithm. - **Session Embedding Calculation**: For a session containing \( n \) items, its embedding can be calculated by weighted average or other aggregation methods: \[ E_{\text{session}}=\frac{1}{n}\sum_{i = 1}^{n}E_i \] where \( E_i \) is the \( i \)-th item embedding. - **Similarity Calculation**: In the recommendation stage, compare the session embedding with the item embedding by cosine similarity, Euclidean distance or dot - product similarity: \[ \text{Cosine Similarity}(A, B)=\frac{A\cdot B}{\|A\|\|B\|} \] In conclusion, this paper shows great potential in the sequence recommendation problem by introducing LLMs and combining them with traditional recommendation models, and provides valuable references for future research.