Item-Language Model for Conversational Recommendation

Li Yang,Anushya Subbiah,Hardik Patel,Judith Yue Li,Yanwei Song,Reza Mirghaderi,Vikram Aggarwal
2024-06-05
Abstract:Large-language Models (LLMs) have been extremely successful at tasks like complex dialogue understanding, reasoning and coding due to their emergent abilities. These emergent abilities have been extended with multi-modality to include image, audio, and video capabilities. Recommender systems, on the other hand, have been critical for information seeking and item discovery needs. Recently, there have been attempts to apply LLMs for recommendations. One difficulty of current attempts is that the underlying LLM is usually not trained on the recommender system data, which largely contains user interaction signals and is often not publicly available. Another difficulty is user interaction signals often have a different pattern from natural language text, and it is currently unclear if the LLM training setup can learn more non-trivial knowledge from interaction signals compared with traditional recommender system methods. Finally, it is difficult to train multiple LLMs for different use-cases, and to retain the original language and reasoning abilities when learning from recommender system data. To address these three limitations, we propose an Item-Language Model (ILM), which is composed of an item encoder to produce text-aligned item representations that encode user interaction signals, and a frozen LLM that can understand those item representations with preserved pretrained knowledge. We conduct extensive experiments which demonstrate both the importance of the language-alignment and of user interaction knowledge in the item encoder.
Information Retrieval,Computation and Language
What problem does this paper attempt to address?
This paper mainly explores how to apply large language models (LLMs) to dialogue-based recommendation systems. The current challenges include: LLMs are often not trained on recommendation system data (which contains user interaction signals and is often not publicly available); it is unclear whether LLMs can learn more non-trivial knowledge from these signals, which are different from natural language text patterns; and how to preserve their original language and reasoning abilities when training multiple LLMs for different use cases. To address these issues, the paper proposes the "item language model" (ILM). ILM consists of an item encoder and a frozen LLM, where the encoder generates item representations aligned with language and the LLM is able to understand these representations and leverage pre-training knowledge. The item encoder learns from contrastive learning tasks, including item-item contrast, from collaborative filtering embeddings while preserving the original capabilities of the LLM. During multi-task fine-tuning, only the encoder and adapter parameters are updated to retain the pre-training ability of the LLM. Experiments show that the ILM approach outperforms existing methods in various dialogue recommendation tasks, demonstrating the importance of language alignment and user interaction knowledge. The paper also discusses related work, including the application of LLMs in recommendation systems, item representation methods, and multimodal LLM approaches, and provides detailed model architecture, training process, and experimental results.