ELCoRec: Enhance Language Understanding with Co-Propagation of Numerical and Categorical Features for Recommendation

Jizheng Chen,Kounianhua Du,Jianghao Lin,Bo Chen,Ruiming Tang,Weinan Zhang
2024-06-27
Abstract:Large language models have been flourishing in the natural language processing (NLP) domain, and their potential for recommendation has been paid much attention to. Despite the intelligence shown by the recommendation-oriented finetuned models, LLMs struggle to fully understand the user behavior patterns due to their innate weakness in interpreting numerical features and the overhead for long context, where the temporal relations among user behaviors, subtle quantitative signals among different ratings, and various side features of items are not well explored. Existing works only fine-tune a sole LLM on given text data without introducing that important information to it, leaving these problems unsolved. In this paper, we propose ELCoRec to Enhance Language understanding with CoPropagation of numerical and categorical features for Recommendation. Concretely, we propose to inject the preference understanding capability into LLM via a GAT expert model where the user preference is better encoded by parallelly propagating the temporal relations, and rating signals as well as various side information of historical items. The parallel propagation mechanism could stabilize heterogeneous features and offer an informative user preference encoding, which is then injected into the language models via soft prompting at the cost of a single token embedding. To further obtain the user's recent interests, we proposed a novel Recent interaction Augmented Prompt (RAP) template. Experiment results over three datasets against strong baselines validate the effectiveness of ELCoRec. The code is available at https://anonymous.4open.science/r/CIKM_Code_Repo-E6F5/README.md.
Information Retrieval
What problem does this paper attempt to address?
The paper attempts to address two main challenges of large language models (LLMs) in recommendation systems: 1. **Numerical Insensitivity**: LLMs have difficulty understanding numerical information in prompt templates, failing to accurately capture the temporal relationships of user behaviors, the subtle quantitative signals between different ratings, and various side features of items. For example, LLMs may ignore the temporal relationships between user behaviors or overlook the precise quantitative information provided by numerical ratings, treating this information as ordinary text strings. 2. **Encoding Overhead**: LLMs have high inference latency and training costs, and the length of the dialogue window is limited. This makes it impractical to construct prompt templates that include all rich side features (such as genres and producers). Additionally, to ensure the appropriate length of input responses, it is necessary to filter out long-term interaction histories of users, leading to the loss of valuable time-series information during retrieval. These issues result in the loss of a significant amount of information beneficial to recommendation performance during the construction of prompt templates from raw recommendation data. Therefore, the paper proposes a new framework, ELCoRec, aimed at enhancing language understanding through the co-propagation of numerical and categorical features to improve the performance of recommendation systems. Specifically, ELCoRec addresses the above issues in the following ways: - **Numerical Insensitivity Issue**: By parallelly propagating numerical and categorical features through a Graph Attention Network (GAT) expert network, generating user preference encodings that help enhance LLMs' understanding of numerical features. - **Encoding Overhead Issue**: By injecting preference encodings into the semantic space of LLMs using soft prompt techniques, requiring only the cost of a single token embedding. - **Recent Interaction Enhanced Prompt (RAP) Template**: Based on user history retrieval techniques, supplementing the user's recent interaction history sequence to address the loss of continuous time-series information due to user history retrieval. Through these methods, ELCoRec is able to more comprehensively model user interests and improve the performance of recommendation systems.