Revisiting BPR: A Replicability Study of a Common Recommender System Baseline

Aleksandr Milogradskii,Oleg Lashinin,Alexander P,Marina Ananyeva,Sergey Kolesnikov
DOI: https://doi.org/10.1145/3640457.3688073
2024-10-18
Abstract:Bayesian Personalized Ranking (BPR), a collaborative filtering approach based on matrix factorization, frequently serves as a benchmark for recommender systems research. However, numerous studies often overlook the nuances of BPR implementation, claiming that it performs worse than newly proposed methods across various tasks. In this paper, we thoroughly examine the features of the BPR model, indicating their impact on its performance, and investigate open-source BPR implementations. Our analysis reveals inconsistencies between these implementations and the original BPR paper, leading to a significant decrease in performance of up to 50% for specific implementations. Furthermore, through extensive experiments on real-world datasets under modern evaluation settings, we demonstrate that with proper tuning of its hyperparameters, the BPR model can achieve performance levels close to state-of-the-art methods on the top-n recommendation tasks and even outperform them on specific datasets. Specifically, on the Million Song Dataset, the BPR model with hyperparameters tuning statistically significantly outperforms Mult-VAE by 10% in NDCG@100 with binary relevance function.
Information Retrieval
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is about the performance evaluation of the Bayesian Personalized Ranking (BPR) model in the research of recommendation systems. Specifically: 1. **Consistency of open - source code implementations**: The authors of the paper found that many open - source BPR implementations are inconsistent with the model described in the original BPR paper, and these inconsistencies lead to a significant decrease in performance, up to 50%. Therefore, they hope to evaluate the differences between these open - source implementations and the original implementation through detailed experiments. 2. **Influence of model features**: The authors also explored the influence of different features of the BPR model (such as regularization, optimizer selection, negative sample sampling method, item bias, etc.) on the model performance. Through experimental analysis, they hope to reveal which features are crucial for improving the performance of the BPR model. 3. **Effect of hyper - parameter tuning**: The paper shows through extensive experiments that by carefully tuning the hyper - parameters of the BPR model, it can reach a performance level close to or exceeding the existing state - of - the - art methods in top - level recommendation tasks. In particular, on the "Million Song Dataset", the tuned BPR model is 10% higher than Mult - VAE in the NDCG@100 metric, which is statistically significant. In summary, this paper aims to provide a comprehensive reproducibility study by re - examining the BPR model, in order to correct the problem of underestimating BPR performance in the current literature, and show that through appropriate tuning, the BPR model can still perform excellently on some datasets. This not only helps to understand the real performance of the BPR model, but also provides valuable references for researchers in the field of recommendation systems.