A Large Language Model Enhanced Sequential Recommender for Joint Video and Comment Recommendation

Bowen Zheng,Zihan Lin,Enze Liu,Chen Yang,Enyang Bai,Cheng Ling,Wayne Xin Zhao,Ji-Rong Wen
2024-03-20
Abstract:In online video platforms, reading or writing comments on interesting videos has become an essential part of the video watching experience. However, existing video recommender systems mainly model users' interaction behaviors with videos, lacking consideration of comments in user behavior modeling. In this paper, we propose a novel recommendation approach called LSVCR by leveraging user interaction histories with both videos and comments, so as to jointly conduct personalized video and comment recommendation. Specifically, our approach consists of two key components, namely sequential recommendation (SR) model and supplemental large language model (LLM) recommender. The SR model serves as the primary recommendation backbone (retained in deployment) of our approach, allowing for efficient user preference modeling. Meanwhile, we leverage the LLM recommender as a supplemental component (discarded in deployment) to better capture underlying user preferences from heterogeneous interaction behaviors. In order to integrate the merits of the SR model and the supplemental LLM recommender, we design a twostage training paradigm. The first stage is personalized preference alignment, which aims to align the preference representations from both components, thereby enhancing the semantics of the SR model. The second stage is recommendation-oriented fine-tuning, in which the alignment-enhanced SR model is fine-tuned according to specific objectives. Extensive experiments in both video and comment recommendation tasks demonstrate the effectiveness of LSVCR. Additionally, online A/B testing on the KuaiShou platform verifies the actual benefits brought by our approach. In particular, we achieve a significant overall gain of 4.13% in comment watch time.
Information Retrieval,Artificial Intelligence
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is that on online video platforms, the existing video recommendation systems mainly focus on the interaction behaviors between users and videos, while ignoring the role of comments in user behavior modeling. With the growth of online video communities, users' comments on videos are becoming more and more important, because these comments not only provide supplementary information but also enhance the users' viewing experience. Therefore, this research aims to improve the recommendation quality and enhance user participation by integrating video and comment data. Specifically, the paper proposes a novel recommendation method named LSVCR (Large Language Model Enhanced Sequential Recommender for Joint Video and Comment Recommendation). This method utilizes the historical interaction records of users with videos and comments to jointly perform personalized video and comment recommendations. To achieve this goal, LSVCR contains two key components: 1. **Sequential Recommendation model (SR model)**: As the main recommendation framework, it is retained in deployment and is used to efficiently model user preferences. 2. **Supplemental Large Language Model recommender (LLM recommender)**: It is used in the training stage to capture the potential preferences of users from different interaction behaviors and is discarded during deployment. To integrate the advantages of these two components, the paper designs a two - stage training paradigm: - **Stage 1: Personalized Preference Alignment**: The purpose is to align the preference representations from the two components, thereby enhancing the semantic understanding ability of the SR model. - **Stage 2: Recommendation - Oriented Fine - tuning**: Fine - tune the aligned SR model according to specific goals to improve the recommendation performance. The experimental results show that LSVCR exhibits significant effects in video and comment recommendation tasks, and the online A/B test verifies its effectiveness in actual industrial recommendation systems. In particular, in terms of comments, LSVCR achieves a 4.13% increase in viewing time and a 1.36% increase in the number of interactions. ### Formula Summary - **Text Feature Embedding**: \[ z_{v_i} = [\text{LLM}(t_i)\|\text{LLM}(c_i)]W_1, \] \[ z_{c_j} = [\text{LLM}(t_j)\|\text{MEAN}(\text{LLM}(c^1_j),...,\text{LLM}(c^k_j))]W_1. \] - **Sequence Representation Learning**: \[ H_v=\text{Transformer}_v(E_v + eP_v), \] \[ H_c=\text{Transformer}_c(E_c + eP_c). \] - **Preference Extraction**: \[ s_v = F_v(bH_v)=\sum_{i = 1}^n\alpha_i h^v_i,\quad\alpha_i=\frac{\exp(f(h^v_i))}{\sum_{k = 1}^n\exp(f(h^v_k))}, \] \[ s_c = F_c(bH_c)=\sum_{j = 1}^m\beta_j h^c_j,\quad\beta_j=\frac{\exp(g(e_t_{m + 1},h^c_j))}{\sum_{k = 1}^m\exp(g(e_t_{m + 1},h^c_k))}. \] - **Contrast Loss Function**: \[ L_{SSC}=\frac{1}{2}(\text{InfoN}