Bias in Search and Recommender Systems
Ricardo Baeza-Yates
DOI: https://doi.org/10.1145/3383313.3418435
2020-09-22
Abstract:We explore the vicious cycle of bias on the Web related to search and recommender systems. The first bias is activity bias [1], called by Nielsen participation inequality in Internet [3]. This means that when sampling content, data will have many different types of bias such as gender, language, topic, etc. During the design and implementation of the system, biases might be added, and we call those the true algorithmic bias. In addition, when evaluating the system we may have been biased and hence that may also be reflected in algorithmic bias. The design of the interaction also matters, adding several other biases. All of them affect the data that is gathered for optimizing and personalizing the system [1]. Finally, our own cognitive biases also taint the interaction data, including confirmation bias and other behavioral biases. In all these cases we may need to debias the input data, to use techniques that can handle biased data (e.g., machine learning algorithms tailored for that), or debias the output (when we have already lost information). Most systems are optimized by using implicit user feedback, that is, clicks or other trackable user interactions. However, those interactions are biased to the choices that such systems offer to their users, as clicks can only be done on things that are shown to us. Hence, these feedback loops are tainted by presentation or exposure bias [1]. The most well-known problem related to this bias is called the filter bubble [4] or the echo chamber effect. Solutions to this problem include the explore and exploit paradigm (e.g., learning the world), diversity, novelty, and serendipity. Depending on the exploration technique, the bubble is smaller or bigger [2]. On the other hand, too much exploration also reduces short-term revenue and hence is usually bounded. However, we believe that recommender systems could improve their long-term revenue if significantly more exploration is performed, probably diminishing at the same time the tension between user experience and monetization. This is good for the recommender system but also creates more fair and healthy digital markets for everyone, users and publishers/sellers.