Exploring the Upper Limits of Text-Based Collaborative Filtering Using Large Language Models: Discoveries and Insights

Ruyu Li,Wenhao Deng,Yu Cheng,Zheng Yuan,Jiaqi Zhang,Fajie Yuan
2023-05-19
Abstract:Text-based collaborative filtering (TCF) has become the mainstream approach for text and news recommendation, utilizing text encoders, also known as language models (LMs), to represent items. However, existing TCF models primarily focus on using small or medium-sized LMs. It remains uncertain what impact replacing the item encoder with one of the largest and most powerful LMs, such as the 175-billion parameter GPT-3 model, would have on recommendation performance. Can we expect unprecedented results? To this end, we conduct an extensive series of experiments aimed at exploring the performance limits of the TCF paradigm. Specifically, we increase the size of item encoders from one hundred million to one hundred billion to reveal the scaling limits of the TCF paradigm. We then examine whether these extremely large LMs could enable a universal item representation for the recommendation task. Furthermore, we compare the performance of the TCF paradigm utilizing the most powerful LMs to the currently dominant ID embedding-based paradigm and investigate the transferability of this TCF paradigm. Finally, we compare TCF with the recently popularized prompt-based recommendation using ChatGPT. Our research findings have not only yielded positive results but also uncovered some surprising and previously unknown negative outcomes, which can inspire deeper reflection and innovative thinking regarding text-based recommender systems. Codes and datasets will be released for further research.
Information Retrieval
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper attempts to explore the performance ceiling of Text-based Collaborative Filtering (TCF) when using Large Language Models (LLMs) as text encoders. Specifically, the paper focuses on the following key issues: 1. **Performance Response and Scale Expansion**: - What is the impact of increasing the scale of the text encoder (from 100 million to 10 billion parameters) on the performance of the recommendation system? Can the performance limit be reached at the 10 billion parameter level? - Can the GPT-3 model with 17.5 billion parameters generate general-purpose text representations when used as a text encoder? 2. **Generality of Ultra-Large Language Models**: - Can ultra-large language models (such as the 17.5 billion parameter GPT-3) generate general-purpose text representations suitable for recommendation tasks? 3. **Comparison with ID Embedding Models**: - Can the TCF model using a 17.5 billion parameter LM as a text encoder easily surpass ID-based embedding recommendation models (IDCF), especially in popular item recommendation tasks? 4. **Cross-Domain Recommendation Capability**: - Is the TCF paradigm close to a universal recommendation model? Specifically, what is its cross-domain recommendation capability when using a 17.5 billion parameter text encoder? 5. **Comparison with Prompt Engineering-Based Recommendation Methods**: - Can the recently popular ChatGPT-based recommendation method (ChatGPT4Rec) surpass the traditional TCF paradigm in typical recommendation settings? Through these research questions, the paper aims to explore the potential and limitations of the TCF paradigm when using ultra-large language models and provide guidance for future recommendation system research.