Bandit Algorithms in Information Retrieval

Dorota Glowacka
DOI: https://doi.org/10.1561/1500000067
2019-01-01
Foundations and Trends® in Information Retrieval
Abstract:Bandit algorithms, named after casino slot machines sometimes known as "one-armed bandits", fall into a broad category of stochastic scheduling problems. In the setting with multiple arms, each arm generates a reward with a given probability. The gambler's aim is to find the arm producing the highest payoff and then continue playing in order to accumulate the maximum reward possible. However, having only a limited number of plays, the gambler is faced with a dilemma: should he play the arm currently known to produce the highest reward or should he keep on trying other arms in the hope of finding a better paying one? This problem formulation is easily applicable to many real-life scenarios, hence in recent years there has been an increased interest in developing bandit algorithms for a range of applications. In information retrieval and recommender systems, bandit algorithms, which are simple to implement and do not require any training data, have been particularly popular in online personalization, online ranker evaluation and search engine optimization. This survey provides a brief overview of bandit algorithms designed to tackle specific issues in information retrieval and recommendation and, where applicable, it describes how they were applied in practice.<h3>Suggested Citation</h3>Dorota Glowacka (2019), "Bandit Algorithms in Information Retrieval", Foundations and Trends® in Information Retrieval: Vol. 13: No. 4, pp 299-424. http://dx.doi.org/10.1561/1500000067
What problem does this paper attempt to address?