DeBaTeR: Denoising Bipartite Temporal Graph for Recommendation

Xinyu He,Jose Sepulveda,Mostafa Rahmani,Alyssa Woo,Fei Wang,Hanghang Tong
2024-11-14
Abstract:Due to the difficulty of acquiring large-scale explicit user feedback, implicit feedback (e.g., clicks or other interactions) is widely applied as an alternative source of data, where user-item interactions can be modeled as a bipartite graph. Due to the noisy and biased nature of implicit real-world user-item interactions, identifying and rectifying noisy interactions are vital to enhance model performance and robustness. Previous works on purifying user-item interactions in collaborative filtering mainly focus on mining the correlation between user/item embeddings and noisy interactions, neglecting the benefit of temporal patterns in determining noisy interactions. Time information, while enhancing the model utility, also bears its natural advantage in helping to determine noisy edges, e.g., if someone usually watches horror movies at night and talk shows in the morning, a record of watching a horror movie in the morning is more likely to be noisy interaction. Armed with this observation, we introduce a simple yet effective mechanism for generating time-aware user/item embeddings and propose two strategies for denoising bipartite temporal graph in recommender systems (DeBaTeR): the first is through reweighting the adjacency matrix (DeBaTeR-A), where a reliability score is defined to reweight the edges through both soft assignment and hard assignment; the second is through reweighting the loss function (DeBaTeR-L), where weights are generated to reweight user-item samples in the losses. Extensive experiments have been conducted to demonstrate the efficacy of our methods and illustrate how time information indeed helps identifying noisy edges.
Information Retrieval,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to use temporal information to denoise the noisy interaction data in the user - item bipartite graph in the recommendation system. Specifically, the paper focuses on how to identify and correct these noisy interactions to improve model performance and robustness in the recommendation system constructed based on implicit feedback (such as clicks, views, etc.) due to the noisy and biased characteristics of the data. ### Main problems: 1. **Impact of noisy data**: Implicit feedback data (e.g., click and view records) usually contains a large amount of noise, which can mislead the training process of the recommendation system and lead to inaccurate recommendation results. 2. **Importance of temporal information**: Previous works mainly focused on mining the correlation between user or item embeddings and noisy interactions, while ignoring the importance of temporal information for identifying noisy interactions. For example, some user behavior patterns may have obvious temporal characteristics (such as watching horror movies at night and talk shows in the morning), and these temporal characteristics can help more accurately identify abnormal interaction records. ### Solutions: The paper proposes a framework named DeBaTeR (Denoising Bipartite Temporal Graph for Recommendation), aiming to enhance the denoising ability and prediction performance of the recommendation system by introducing time - aware user and item embeddings. Specific methods include: 1. **Time - aware embedding generation**: By encoding timestamp information, generate time - aware user and item embeddings, so that the embeddings not only reflect the global preferences of users but also capture preference changes at specific time points. 2. **Two denoising strategies**: - **Re - weighted adjacency matrix (D/e.scB/a.scT/e.scR/hyphen.scA)**: By defining a reliability scoring function, re - weight or prune the edges of the bipartite graph, thereby reducing the impact of noisy interactions. - **Re - weighted loss function (D/e.scB/a.scT/e.scR/hyphen.scL)**: By re - weighting the samples in the loss function, reduce the impact of noisy samples on model training. ### Formula summary: - Time - aware preference formula: \[ \mathcal{P}_{u,i}^t = (\mathbf{e}_u^t)^T \mathbf{e}_i^t = \mathbf{e}_u^T \mathbf{e}_i + \mathbf{e}_{u,i}^T \mathbf{e}_t \] where \(\mathbf{e}_u\) and \(\mathbf{e}_i\) are the original embeddings of the user and the item respectively, \(\mathbf{e}_t\) is the embedding of the timestamp, and \(\mathbf{e}_{u,i}\) is the joint embedding of the user and the item. - Reliability scoring function: \[ R_{u,i}^t = \frac{\cos(\mathbf{e}_u^{(0)} + \mathbf{e}_t, \mathbf{e}_i^{(0)} + \mathbf{e}_t) + 1}{2} \] - Re - weighted BPR loss function: \[ \mathcal{L}_{\text{BPR}} = \frac{1}{|O|} \sum_{(u,i,j,t) \in O} -\log \sigma((\mathbf{e}_u + \mathbf{e}_t)^T (\mathbf{e}_i + \mathbf{e}_t) - (\mathbf{e}_u + \mathbf{e}_t)^T (\mathbf{e}_j + \mathbf{e}_t)) \] Through these methods, the paper shows how to effectively use temporal information to improve the denoising and prediction performance of the recommendation system.