HLLM: Enhancing Sequential Recommendations via Hierarchical Large Language Models for Item and User Modeling

Junyi Chen,Lu Chi,Bingyue Peng,Zehuan Yuan
2024-09-19
Abstract:Large Language Models (LLMs) have achieved remarkable success in various fields, prompting several studies to explore their potential in recommendation systems. However, these attempts have so far resulted in only modest improvements over traditional recommendation models. Moreover, three critical questions remain under-explored: firstly, the real value of LLMs' pre-trained weights, often considered to encapsulate world knowledge; secondly, the necessity of fine-tuning for recommendation tasks; lastly, whether LLMs can exhibit the same scalability benefits in recommendation systems as they do in other domains. In this paper, we propose a novel Hierarchical Large Language Model (HLLM) architecture designed to enhance sequential recommendation systems. Our approach employs a two-tier model: the first Item LLM extracts rich content features from the detailed text description of the item, while the second User LLM utilizes these features to predict users' future interests based on their interaction history. Extensive experiments demonstrate that our method effectively leverages the pre-trained capabilities of open-source LLMs, and further fine-tuning leads to significant performance boosts. Additionally, HLLM achieves excellent scalability, with the largest configuration utilizing 7B parameters for both item feature extraction and user interest modeling. Moreover, HLLM offers excellent training and serving efficiency, making it practical in real-world applications. Evaluations on two large-scale datasets, PixelRec and Amazon Reviews, show that HLLM achieves state-of-the-art results, outperforming traditional ID-based models by a wide margin. In online A/B testing, HLLM showcases notable gains, validating its practical impact in real-world recommendation scenarios. Codes are available at <a class="link-external link-https" href="https://github.com/bytedance/HLLM" rel="external noopener nofollow">this https URL</a>.
Information Retrieval,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper attempts to enhance sequential recommendation systems by proposing a new Hierarchical Large Language Model (HLLM) architecture. Specifically, the paper addresses the following three key issues: 1. **The Practical Value of Pre-trained Weights of Large Language Models (LLM)**: - Although LLMs have achieved significant success in other fields, the practical value of their pre-trained weights in recommendation systems still needs further validation. The paper explores whether these pre-trained weights can truly enhance the performance of recommendation systems. 2. **The Necessity of Fine-tuning**: - The paper investigates whether fine-tuning LLMs can further improve the performance of recommendation tasks. Despite LLMs having strong world knowledge from pre-training on large-scale corpora, whether fine-tuning is necessary for recommendation tasks remains an open question. 3. **Scalability of LLMs in Recommendation Systems**: - The paper verifies the scalability of LLMs in recommendation systems, particularly whether performance can continue to improve as model parameters increase. This is an important research direction to see if it aligns with the scalability performance of LLMs in other fields. ### Solution To address the above challenges, the paper proposes the HLLM architecture, which includes two levels of LLMs: 1. **Item LLM**: - Used to extract rich content features from detailed textual descriptions of items. By adding a special marker [ITEM] at the end of the item description, the item description is input into the Item LLM, and the hidden state corresponding to the [ITEM] marker is output as the feature representation of the item. 2. **User LLM**: - Utilizes the item features extracted by the Item LLM to predict the user's future interests based on their interaction history. The User LLM inputs a sequence of item features from the user's interaction history and predicts the next item. ### Experimental Results - **Effect of Pre-training and Fine-tuning**: - Experimental results show that pre-trained weights significantly contribute to the performance improvement of HLLM, especially for item feature extraction and user interest modeling. Additionally, fine-tuning is crucial for enhancing the performance of recommendation tasks. - **Scalability**: - As the model parameters increase, the performance of HLLM continues to improve, indicating good scalability of the architecture. Experiments on large-scale datasets also validate this point. - **Comparison with Existing Methods**: - Experimental results on multiple academic datasets show that HLLM significantly outperforms traditional ID-based models and also performs well in practical industrial applications. ### Main Contributions 1. **Proposed a New Hierarchical LLM Framework (HLLM)**: - This framework significantly outperforms classic ID-based models on large-scale academic datasets and has been validated in real industrial scenarios, demonstrating excellent training and inference efficiency. 2. **Effectively Transferred World Knowledge from the Pre-training Stage of LLMs**: - Including item feature extraction and user interest modeling, but task-specific fine-tuning is still indispensable. 3. **Demonstrated Good Scalability**: - Performance continues to improve with the increase in data volume and model parameters, indicating that this method has potential for larger-scale datasets and model sizes. Through these contributions, the paper not only addresses key issues in current recommendation systems but also provides new directions for future research.