Contrastive Learning for Debiased Candidate Generation in Large-Scale Recommender Systems

Chang Zhou,Jianxin Ma,Jianwei Zhang,Jingren Zhou,Hongxia Yang
DOI: https://doi.org/10.48550/arXiv.2005.12964
2021-06-05
Abstract:Deep candidate generation (DCG) that narrows down the collection of relevant items from billions to hundreds via representation learning has become prevalent in industrial recommender systems. Standard approaches approximate maximum likelihood estimation (MLE) through sampling for better scalability and address the problem of DCG in a way similar to language modeling. However, live recommender systems face severe exposure bias and have a vocabulary several orders of magnitude larger than that of natural language, implying that MLE will preserve and even exacerbate the exposure bias in the long run in order to faithfully fit the observed samples. In this paper, we theoretically prove that a popular choice of contrastive loss is equivalent to reducing the exposure bias via inverse propensity weighting, which provides a new perspective for understanding the effectiveness of contrastive learning. Based on the theoretical discovery, we design CLRec, a contrastive learning method to improve DCG in terms of fairness, effectiveness and efficiency in recommender systems with extremely large candidate size. We further improve upon CLRec and propose Multi-CLRec, for accurate multi-intention aware bias reduction. Our methods have been successfully deployed in Taobao, where at least four-month online A/B tests and offline analyses demonstrate its substantial improvements, including a dramatic reduction in the Matthew effect.
Information Retrieval,Machine Learning,Social and Information Networks
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the exposure bias in the candidate generation stage in large - scale recommendation systems. Specifically, traditional methods based on maximum likelihood estimation (MLE) are affected by the biases of existing recommendation systems during the training process, making it difficult to recommend high - quality but less - clicked items. This bias not only affects the fairness of recommendations but may also exacerbate the "Matthew effect" (that is, popular items become more popular, while unpopular items become even more unpopular), thus limiting the diversity and effectiveness of the recommendation system. The paper proposes a method based on contrastive learning - CLRec and its improved version Multi - CLRec, which reduces exposure bias by introducing a contrastive loss function. Theoretical analysis shows that the contrastive loss function has a similar effect to the inverse propensity weighting (IPW) method in reducing bias, but the contrastive learning method avoids the two - stage training and numerical instability problems in the IPW method. ### Main contributions: 1. **Theoretical connection**: Establish a theoretical connection between contrastive learning and inverse propensity weighting, and prove that the contrastive loss function can reduce exposure bias. 2. **Method design**: Propose two methods, CLRec and Multi - CLRec, which efficiently implement contrastive learning through a queuing mechanism and reduce computational costs. 3. **Practical application**: These methods have been deployed on Alibaba's Taobao platform, and their effectiveness and superiority have been verified through online A/B testing and offline analysis. ### Core technologies of the solution: - **Contrastive loss function**: By constructing contrast tasks between positive and negative samples, optimize the model to distinguish between relevant and irrelevant items. - **Queuing mechanism**: Use a fixed - size first - in - first - out (FIFO) queue to store positive samples and their representations as negative samples for subsequent batches, ensuring that all items have the opportunity to be sampled. - **Multi - intent awareness**: Multi - CLRec further improves the accuracy of bias reduction by using multiple queues corresponding to different user intents. ### Experimental results: - **Diversity improvement**: CLRec significantly increases the diversity of recommended items, with the number of recommended items increasing from 10,780,111 to 21,905,318. - **Online performance**: CLRec outperforms traditional sampling methods such as sampled - softmax in terms of click - through rate (CTR) and user dwell time. - **Multi - intent advantage**: When the user distribution changes, Multi - CLRec performs better on the offline evaluation metric HitRate@50, further improving the overall performance of the recommendation system. In conclusion, through theoretical analysis and experimental verification, this paper proposes an effective solution to solve the exposure bias problem in the candidate generation stage of large - scale recommendation systems, improving the fairness and diversity of recommendations.