Recommendations as Treatments: Debiasing Learning and Evaluation

Tobias Schnabel,Adith Swaminathan,Ashudeep Singh,Navin Chandak,Thorsten Joachims
DOI: https://doi.org/10.48550/arXiv.1602.05352
2016-05-27
Abstract:Most data for evaluating and training recommender systems is subject to selection biases, either through self-selection by the users or through the actions of the recommendation system itself. In this paper, we provide a principled approach to handling selection biases, adapting models and estimation techniques from causal inference. The approach leads to unbiased performance estimators despite biased data, and to a matrix factorization method that provides substantially improved prediction performance on real-world data. We theoretically and empirically characterize the robustness of the approach, finding that it is highly practical and scalable.
Machine Learning,Artificial Intelligence,Information Retrieval
What problem does this paper attempt to address?
This paper attempts to solve the problem of inaccurate evaluation and training in recommendation systems due to selection biases. Specifically, users usually only rate the movies they like and rarely rate the ones they don't like; similarly, recommendation systems tend to display advertisements or products that they think users are interested in and less likely to display other content. In this case, the data is not Missing Not At Random (MNAR), but missing with bias, so that traditional evaluation methods based on these data cannot accurately reflect the performance of the recommendation system. To solve this problem, the authors propose a method based on causal inference, analogizing the recommendation process to treatment interventions in medical research. Through this method, they can obtain unbiased performance estimates from biased data and develop a matrix factorization method, which has significantly improved prediction performance on real - data. In addition, the authors also explore how to estimate propensity scores in observational data settings and analyze the robustness of this framework in the case of mis - estimated propensity scores. ### Main Contributions 1. **Unbiased Performance Estimation**: Using the propensity - weighted technique in causal inference, unbiased estimators for multiple performance metrics (such as Mean Squared Error (MSE), Mean Absolute Error (MAE), Discounted Cumulative Gain (DCG), etc.) are proposed. 2. **Empirical Risk Minimization (ERM) Framework**: Based on the above estimators, an ERM framework for learning recommendation systems is proposed, and the generalization error bound is derived. 3. **Matrix Factorization Method**: Using the ERM framework, a matrix factorization method that can handle selection biases is derived. This method is conceptually simple and highly scalable. 4. **Propensity Score Estimation**: The method of estimating propensity scores in the case of selection biases caused by user self - selection is explored, and the robustness of the framework to mis - specified propensity scores is analyzed. ### Experimental Verification The authors prove the effectiveness and robustness of the proposed method through extensive experiments. For the evaluation task of the recommendation system, their performance estimators are more accurate than traditional methods; for the learning task, the new matrix factorization method is significantly superior to methods that ignore selection biases and existing state - of - the - art methods. ### Formula Examples - **Mean Squared Error (MSE)**: \[ \text{MSE}: \delta_{u,i}(Y, \hat{Y})=(Y_{u,i}-\hat{Y}_{u,i})^2 \] - **Inverse Propensity Score Estimator (IPS Estimator)**: \[ \hat{R}_{\text{IPS}}(\hat{Y} | P)=\frac{1}{U\cdot I}\sum_{(u,i):O_{u,i} = 1}\frac{\delta_{u,i}(Y, \hat{Y})}{P_{u,i}} \] Through these methods, the authors provide an effective and practical solution to deal with the selection bias problem in recommendation systems.