Cold-Start Recommendation towards the Era of Large Language Models (LLMs): A Comprehensive Survey and Roadmap

Weizhi Zhang,Yuanchen Bei,Liangwei Yang,Henry Peng Zou,Peilin Zhou,Aiwei Liu,Yinghui Li,Hao Chen,Jianling Wang,Yu Wang,Feiran Huang,Sheng Zhou,Jiajun Bu,Allen Lin,James Caverlee,Fakhri Karray,Irwin King,Philip S. Yu
2025-01-04
Abstract:Cold-start problem is one of the long-standing challenges in recommender systems, focusing on accurately modeling new or interaction-limited users or items to provide better recommendations. Due to the diversification of internet platforms and the exponential growth of users and items, the importance of cold-start recommendation (CSR) is becoming increasingly evident. At the same time, large language models (LLMs) have achieved tremendous success and possess strong capabilities in modeling user and item information, providing new potential for cold-start recommendations. However, the research community on CSR still lacks a comprehensive review and reflection in this field. Based on this, in this paper, we stand in the context of the era of large language models and provide a comprehensive review and discussion on the roadmap, related literature, and future directions of CSR. Specifically, we have conducted an exploration of the development path of how existing CSR utilizes information, from content features, graph relations, and domain information, to the world knowledge possessed by large language models, aiming to provide new insights for both the research and industrial communities on CSR. Related resources of cold-start recommendations are collected and continuously updated for the community in <a class="link-external link-https" href="https://github.com/YuanchenBei/Awesome-Cold-Start-Recommendation" rel="external noopener nofollow">this https URL</a>.
Information Retrieval,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the cold - start problem in recommendation systems (Cold - Start Recommendation, CSR). Specifically, the cold - start problem is a long - term challenge faced by recommendation systems. Especially when dealing with new users or new items, due to the lack of sufficient historical interaction data, it is difficult to accurately provide effective recommendations for these users. With the diversification of Internet platforms and the exponential growth of users and items, the importance of cold - start recommendation is becoming increasingly significant. ### Specific manifestations of the cold - start problem: 1. **Cold - start for new users**: When new users use a certain platform for the first time, the recommendation system has no historical interaction data about them. 2. **Cold - start for new items**: When new products, contents and other items appear on the platform, the recommendation system lacks user feedback data for these items. 3. **Cold - start for user - item**: The situation of facing new users and new items simultaneously. ### Main contributions of the paper: 1. **Comprehensive review**: The paper systematically reviews the existing cold - start recommendation methods, covering various CSR tasks starting from different knowledge sources (such as content features, graph relationships, domain information and the world knowledge of large language models). 2. **Innovative taxonomy**: A new taxonomy is introduced, providing a unique perspective to deal with the cold - start challenge and using external knowledge sources to solve the problems of data sparsity and interaction scarcity. 3. **Clearly define the cold - start problem**: For the first time, a clear and comprehensive definition of the cold - start problem is proposed, covering multiple dimensions such as long - tail, user cold - start, item cold - start, user - item cold - start, zero - sample, few - sample and strict cold - start. 4. **Prospect future roadmap**: Based on the comprehensive survey and innovative taxonomy, a forward - looking roadmap connecting current progress and future research directions is proposed, aiming to guide the research community in this challenging field. ### Formula representation: When discussing the cold - start problem, the following formulas can be used to represent the sets of users and items and their interactions: - Let \( U=\{u_1, u_2,\ldots, u_m\} \) be the set of users, and \( V = \{v_1, v_2,\ldots, v_n\} \) be the set of items. - Each user \( u\in U \) is associated with a profile \( S_u \), including interaction history \( I_u \) and contextual features \( C_u \). - Each item \( v\in V \) also has a similar profile \( S_v \), including interaction history \( I_v \) and features \( C_v \). During the training phase, a set of warm - start users \( U \) and items \( V \) are known, and their interaction data are completely observable. While in the tuning and testing phases, a new set of cold - start users \( \tilde{U} \) and items \( \tilde{V} \) may be encountered, and these users and items were not observed during the training phase. By definition, \( U\cap\tilde{U}=\emptyset \), \( U\cup\tilde{U}=U \), and similarly \( V\cap\tilde{V}=\emptyset \), \( V\cup\tilde{V}=V \). In this way, the paper not only defines different cold - start problems in detail, but also systematically classifies and discusses the existing cold - start recommendation models, providing valuable references and guidance for future scientific research work.