A Note on Information-Directed Sampling and Thompson Sampling

Li Zhou
DOI: https://doi.org/10.48550/arXiv.1503.06902
IF: 5.414
2015-03-24
Machine Learning
Abstract:This note introduce three Bayesian style Multi-armed bandit algorithms: Information-directed sampling, Thompson Sampling and Generalized Thompson Sampling. The goal is to give an intuitive explanation for these three algorithms and their regret bounds, and provide some derivations that are omitted in the original papers.
What problem does this paper attempt to address?