Uncertainty-Aware Explainable Recommendation with Large Language Models

Yicui Peng,Hao Chen,Chingsheng Lin,Guo Huang,Jinrong Hu,Hui Guo,Bin Kong,Shu Hu,Xi Wu,Xin Wang
2024-01-31
Abstract:Providing explanations within the recommendation system would boost user satisfaction and foster trust, especially by elaborating on the reasons for selecting recommended items tailored to the user. The predominant approach in this domain revolves around generating text-based explanations, with a notable emphasis on applying large language models (LLMs). However, refining LLMs for explainable recommendations proves impractical due to time constraints and computing resource limitations. As an alternative, the current approach involves training the prompt rather than the LLM. In this study, we developed a model that utilizes the ID vectors of user and item inputs as prompts for GPT-2. We employed a joint training mechanism within a multi-task learning framework to optimize both the recommendation task and explanation task. This strategy enables a more effective exploration of users' interests, improving recommendation effectiveness and user satisfaction. Through the experiments, our method achieving 1.59 DIV, 0.57 USR and 0.41 FCR on the Yelp, TripAdvisor and Amazon dataset respectively, demonstrates superior performance over four SOTA methods in terms of explainability evaluation metric. In addition, we identified that the proposed model is able to ensure stable textual quality on the three public datasets.
Information Retrieval,Artificial Intelligence,Computation and Language,Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the explainability of recommendation systems (Explainable Recommendation). Specifically, the authors focus on how to generate natural - language explanations in recommendation systems to help users understand why a particular item is recommended to them. This can not only improve user satisfaction and trust, but also enhance the transparency and persuasiveness of the recommendation results. Existing research mainly focuses on using large - language models (LLM) to generate text explanations, but directly training these models for explainable - recommendation tasks faces limitations in time and computational resources. Therefore, this paper proposes a new method to generate recommendation explanations by training prompts instead of directly training the entire LLM. To achieve this goal, the authors developed a model that uses user and item ID vectors as prompt inputs to GPT - 2 and jointly trains the recommendation task and the explanation task within a multi - task learning framework. This method allows for more effective exploration of user interests, thereby improving the effectiveness of recommendations and the user experience. Experimental results show that this method achieved performance metrics of 1.59 DIV, 0.57 USR, and 0.41 FCR on the Yelp, TripAdvisor, and Amazon datasets respectively, outperforming four state - of - the - art methods. ### Core contributions of the paper: 1. **Generate recommendation explanations using prompt learning**: Generate natural - language explanations by inputting user and item IDs as continuous prompts into large - language models. Users and items are regarded as two special tokens for vectorized representation. 2. **Dynamic learning weights**: Enforce positive - value regularization by adjusting the regularization term, so that the generated explanations can effectively convey user interests and item attributes, thereby improving the overall quality of recommendations. ### Explanatory evaluation metrics: - **Unique Sentence Ratio (USR)**: Measures the proportion of unique sentences generated. - **Feature Coverage Ratio (FCR)**: Calculates the number of different features appearing in the generated explanations. - **Feature Diversity (DIV)**: Measures the degree of feature intersection among the generated explanations. ### Text - quality evaluation metrics: - **BLEU**: Commonly used in machine translation, measures the n - gram - level overlap. - **ROUGE**: Commonly used in text summarization, also measures the n - gram - level overlap. Through these metrics, the authors prove that their method not only performs excellently in terms of explainability, but also is competitive in terms of text quality.