Cross-Domain Latent Factors Sharing via Implicit Matrix Factorization

Abdulaziz Samra,Evgeney Frolov,Alexey Vasilev,Alexander Grigorievskiy,Anton Vakhrushev
DOI: https://doi.org/10.1145/3640457.3688143
2024-09-24
Abstract:Data sparsity has been one of the long-standing problems for recommender systems. One of the solutions to mitigate this issue is to exploit knowledge available in other source domains. However, many cross-domain recommender systems introduce a complex architecture that makes them less scalable in practice. On the other hand, matrix factorization methods are still considered to be strong baselines for single-domain recommendations. In this paper, we introduce the CDIMF, a model that extends the standard implicit matrix factorization with ALS to cross-domain scenarios. We apply the Alternating Direction Method of Multipliers to learn shared latent factors for overlapped users while factorizing the interaction matrix. In a dual-domain setting, experiments on industrial datasets demonstrate a competing performance of CDIMF for both cold-start and warm-start. The proposed model can outperform most other recent cross-domain and single-domain models. We also provide the code to reproduce experiments on GitHub.
Information Retrieval,Machine Learning
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper aims to address the issue of data sparsity in recommendation systems, particularly in cross-domain recommendation scenarios. Specifically, the paper proposes a model named CDIMF (Cross-Domain Implicit Matrix Factorization), which extends the standard Implicit Matrix Factorization (IMF) method by sharing users' latent factors across multiple domains using the Alternating Direction Method of Multipliers (ADMM). ### Background and Motivation 1. **Data Sparsity**: A common problem in recommendation systems is data sparsity, where most entries in the user-item interaction matrix are missing. This leads to a decline in recommendation quality. 2. **Cold Start Problem**: The recommendation performance for new users or new items is poor due to the lack of sufficient historical data. 3. **Cross-Domain Recommendation**: Many companies have multiple related domains of products or services, and user data from these domains can complement each other to improve the performance of recommendation systems. However, traditional cross-domain recommendation systems are often complex in architecture and difficult to deploy on a large scale in practical applications. ### Solution 1. **CDIMF Model**: This model alleviates data sparsity and cold start problems by sharing users' latent factors across multiple domains. Specifically, the model uses the ADMM algorithm to learn shared latent factors across different domains while factorizing the interaction matrix for each domain. 2. **Alternating Least Squares (ALS)**: CDIMF is based on the ALS method, known for its simple implementation and fast convergence. By introducing ADMM, the model can effectively share information across multiple domains without needing to centralize all data in one place. 3. **Privacy Protection**: The model design considers privacy protection by exchanging perturbed user embedding representations instead of raw user historical data, thus preventing user data leakage. ### Experimental Results 1. **Warm Start Scenario**: Experiments on multiple datasets show that CDIMF significantly improves recommendation performance in most cases, especially excelling in the NDCG@10 metric. 2. **Cold Start Scenario**: In cold start scenarios, CDIMF also performs well, although the improvement is not as pronounced as in warm start scenarios. 3. **Comparison with Baseline Models**: CDIMF outperforms existing single-domain and cross-domain recommendation models in multiple experiments, including BPRMF, NeuMF, NGCF, and LightGCN. ### Conclusion The paper proposes an effective cross-domain recommendation system model, CDIMF, which alleviates data sparsity and cold start problems by sharing users' latent factors across multiple domains. Experimental results show that CDIMF performs excellently on multiple datasets, demonstrating strong practical value.