Abstract:Recommender system usually faces popularity bias. From the popularity distribution shift perspective, the normal paradigm trained on exposed items (most are hot items) identifies that recommending popular items more frequently can achieve lower loss, thus injecting popularity information into item property embedding, e.g., id embedding. From the long-tail distribution shift perspective, the sparse interactions of long-tail items lead to insufficient learning of them. The resultant distribution discrepancy between hot and long-tail items would not only inherit the bias, but also amplify the bias. Existing work addresses this issue with inverse propensity scoring (IPS) or causal embeddings. However, we argue that not all popularity biases mean bad effects, i.e., some items show higher popularity due to better quality or conform to current trends, which deserve more recommendations. Blindly seeking unbiased learning may inhibit high-quality or fashionable items. To make better use of the popularity bias, we propose a co-training disentangled domain adaptation network (CD$^2$AN), which can co-train both biased and unbiased models. Specifically, for popularity distribution shift, CD$^2$AN disentangles item property representation and popularity representation from item property embedding. For long-tail distribution shift, we introduce additional unexposed items (most are long-tail items) to align the distribution of hot and long-tail item property representations. Further, from the instances perspective, we carefully design the item similarity regularization to learn comprehensive item representation, which encourages item pairs with more effective co-occurrences patterns to have more similar item property representations. Based on offline evaluations and online A/B tests, we show that CD$^2$AN outperforms the existing debiased solutions. Currently, CD$^2$AN has been successfully deployed at Mobile Taobao App and handling major online traffic.

Counteracting User Attention Bias in Music Streaming Recommendation via Reward Modification

Recommending More Suitable Music Based on Users' Real Context.

Counterfactual Reward Modification for Streaming Recommendation with Delayed Feedback

Co-training Disentangled Domain Adaptation Network for Leveraging Popularity Bias in Recommenders

Neural Dueling Bandits

Exploration in Interactive Personalized Music Recommendation: A Reinforcement Learning Approach

Counterfactual contextual bandit for recommendation under delayed feedback

Fairness Through Domain Awareness: Mitigating Popularity Bias For Music Discovery

Mitigating Exposure Bias in Online Learning to Rank Recommendation: A Novel Reward Model for Cascading Bandits

Cascading Bandits: Optimizing Recommendation Frequency in Delayed Feedback Environments.

Deconfounded Recommendation for Alleviating Bias Amplification

Counterfactual Adversarial Learning for Recommendation

The Nah Bandit: Modeling User Non-compliance in Recommendation Systems

Biased Dueling Bandits with Stochastic Delayed Feedback

LCD: Adaptive Label Correction for Denoising Music Recommendation.

A Bias Study and an Unbiased Deep Neural Network for Recommender Systems

Fatigue-aware Bandits for Dependent Click Models

Leveraging Negative Signals with Self-Attention for Sequential Music Recommendation

Denoising Implicit Feedback for Recommendation

Meta Clustering of Neural Bandits

Achieving User-Side Fairness in Contextual Bandits