How robust is MovieLens? A dataset analysis for recommender systems

Anne-Marie Tousch
DOI: https://doi.org/10.48550/arXiv.1909.12799
2019-09-12
Abstract:Research publication requires public datasets. In recommender systems, some datasets are largely used to compare algorithms against a --supposedly-- common benchmark. Problem: for various reasons, these datasets are heavily preprocessed, making the comparison of results across papers difficult. This paper makes explicit the variety of preprocessing and evaluation protocols to test the robustness of a dataset (or lack of flexibility). While robustness is good to compare results across papers, for flexible datasets we propose a method to select a preprocessing protocol and share results more transparently.
Information Retrieval,Machine Learning
What problem does this paper attempt to address?