Abstract:Recommender Systems (RSs) are pivotal in diverse domains such as e-commerce, music streaming, and social media. This paper conducts a comparative analysis of prevalent loss functions in RSs: Binary Cross-Entropy (BCE), Categorical Cross-Entropy (CCE), and Bayesian Personalized Ranking (BPR). Exploring the behaviour of these loss functions across varying negative sampling settings, we reveal that BPR and CCE are equivalent when one negative sample is used. Additionally, we demonstrate that all losses share a common global minimum. Evaluation of RSs mainly relies on ranking metrics known as Normalized Discounted Cumulative Gain (NDCG) and Mean Reciprocal Rank (MRR). We produce bounds of the different losses for negative sampling settings to establish a probabilistic lower bound for NDCG. We show that the BPR bound on NDCG is weaker than that of BCE, contradicting the common assumption that BPR is superior to BCE in RSs training. Experiments on five datasets and four models empirically support these theoretical findings. Our code is available at \url{<a class="link-external link-https" href="https://anonymous.4open.science/r/recsys_losses" rel="external noopener nofollow">this https URL</a>} .

What problem does this paper attempt to address?

This paper attempts to solve the performance and optimization problems of different loss functions in recommender systems (RSs) under the negative sampling setting. Specifically, the author conducts a theoretical analysis of three commonly - used loss functions - Binary Cross - Entropy (BCE), Categorical Cross - Entropy (CCE), and Bayesian Personalized Ranking (BPR), explores their behaviors under different negative sampling strategies, and reveals the following key issues: 1. **Equivalence of loss functions**: When only one negative sample is used, BPR and CCE are equivalent. In addition, all three loss functions share the same global minimum under certain conditions. 2. **Relationship between loss functions and ranking metrics**: By establishing the relationship between loss functions and ranking metrics (such as Normalized Discounted Cumulative Gain (NDCG) and Mean Reciprocal Rank (MRR)), the author proves that optimizing these loss functions is actually equivalent to maximizing the lower bound of NDCG or MRR. However, in the case of negative sampling, this relationship is probabilistic. 3. **Boundary comparison of different loss functions**: The author derives the probability lower bounds of each loss function, and by comparing these lower bounds, finds that BCE is more conducive to improving NDCG than BPR and CCE in some cases. Specifically, in extreme cases, the lower bound of CCE for NDCG is weaker, followed by BPR, and BCE is the strongest. 4. **Experimental verification**: Experiments on five datasets and four models prove that the above - mentioned theoretical analysis is consistent with the actual training results, especially in the later stage of training, the loss functions optimize the meaningful NDCG lower bound. ### Formula summary - **BCE loss function**: \[ L_{\text{BCE}} = -\sum_{u = 1}^U \ell_u^{\text{BCE}} \] \[ \ell_u^{\text{BCE}} = \log \sigma(s_{u, i^+})+\sum_{i \in I^-_u} \log (1 - \sigma(s_{u, i})) \] - **CCE loss function**: \[ L_{\text{CCE}} = -\sum_{u = 1}^U \ell_u^{\text{CCE}} \] \[ \ell_u^{\text{CCE}} = \log \left(\frac{e^{s_{u, i^+}}}{e^{s_{u, i^+}}+\sum_{i \in I^-_u} e^{s_{u, i}}}\right) \] - **BPR loss function**: \[ L_{\text{BPR}} = -\sum_{u = 1}^U \ell_u^{\text{BPR}} \] \[ \ell_u^{\text{BPR}} = \sum_{i \in I^-_u} \log \sigma(s_{u, i^+}-s_{u, i}) \] - **NDCG metric**: \[ \text{NDCG}(r^+)=\frac{1}{\log_2(1 + r^+)} \] - **MRR metric**: \[ \text{MRR}(r^+)=\frac{1}{r^+} \] ### Conclusion This paper reveals the performance differences of BCE, CCE, and BPR in recommender systems through theoretical analysis and experimental verification of different loss functions under negative sampling conditions. In particular, it shows that in some cases, BCE may be more suitable for improving NDCG than BPR and CCE, and this conclusion provides a theoretical basis for selecting appropriate loss functions.

A Theoretical Analysis of Recommendation Loss Functions under Negative Sampling

ABNS: Association-based negative sampling for collaborative filtering

Personalized Ranking with Importance Sampling.

BSL: Understanding and Improving Softmax Loss for Recommendation

Enhanced Bayesian Personalized Ranking for Robust Hard Negative Sampling in Recommender Systems

On the Theories Behind Hard Negative Sampling for Recommendation

Understanding the Ranking Loss for Recommendation with Sparse User Feedback

Bayesian Negative Sampling for Recommendation

Evaluating Performance and Bias of Negative Sampling in Large-Scale Sequential Recommendation Models

Revisiting Negative Sampling Vs. Non-sampling in Implicit Recommendation

Negative Sampling in Recommendation: A Survey and Future Directions

Loss Aversion in Recommender Systems: Utilizing Negative User Preference to Improve Recommendation Quality

Improved Estimation of Ranks for Learning Item Recommenders with Negative Sampling

Sampler Design for Bayesian Personalized Ranking by Leveraging View Data

Learning Recommenders for Implicit Feedback with Importance Resampling

Reducing Popularity Bias in Recommender Systems through AUC-Optimal Negative Sampling

Generating Negative Samples for Sequential Recommendation

Rankmbpr: Rank-Aware Mutual Bayesian Personalized Ranking For Item Recommendationl

Personalized Negative Reservoir for Incremental Learning in Recommender Systems

Reinforced Negative Sampling for Recommendation with Exposure Data

gSASRec: Reducing Overconfidence in Sequential Recommendation Trained with Negative Sampling