Uncovering the Propensity Identification Problem in Debiased Recommendations

Honglei Zhang,Shuyi Wang,Haoxuan Li,Chunyuan Zheng,Xu Chen,Li Liu,Shanshan Luo,Peng Wu
DOI: https://doi.org/10.1109/icde60146.2024.00056
2024-01-01
Abstract:In database of recommender systems, users' ratings for most items are usually missing, resulting in selection bias when users selectively choose items to rate. To address this problem, propensity-based methods, e.g., inverse propensity scoring and doubly robust, have been widely studied and applied to missing rating prediction and post-click conversion rate prediction tasks. However, have we completely eliminated the selection bias? Under what missing data mechanism can previous studies completely eliminate the selection bias and lead to unbiased learning? In this paper, following the previous literature on statistics, we first formally define three missing data mechanisms, i.e., missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR), and discuss the widespread prevalence of MNAR in recommender systems. Next, we theoretically reveal that the unbiasedness of previous propensity-based debiasing methods is valid only when data are MCAR or MAR, while it leads to biased predictions when data are MNAR. To tackle this research gap, we propose to disentangle user and item embeddings into the primary latent vector for rating prediction and the auxiliary latent vector for missing mechanism modeling. We prove the identifiablility results, and show that the proposed method can achieve unbiased learning under MNAR with imposed constraints. Extensive experiments are conducted on a semi-synthetic dataset and three real-world datasets, validating the effectiveness of our proposed method.
What problem does this paper attempt to address?