Social Bandit Learning: Strangers Can Help

Jun Zong,Ting Liu,Zhaowei Zhu,Xiliang Luo,Hua Qian
DOI: https://doi.org/10.1109/wcsp49889.2020.9299725
2020-01-01
Abstract:Social learning is usually used to speed up the learning of novice agents, even though this novice agent is only influenced by the external observations. In this paper, an online bandit learning problem that arises in the social learning is analyzed. Specifically, under the multi-armed bandit (MAB) framework, there are multiple agents, one learner, and several targets which are interacting with the unknown environment and making online decisions. To better handle the well-known exploration-exploitation tradeoff in bandit problems and maximize the learner's rewards, we design an online learning algorithm that benefits from others simply by observing the decisions of the targets. The advantage of leveraging the observations is also demonstrated in the derived performance bound. The proposed learning algorithm is further evaluated with numerical simulations.
What problem does this paper attempt to address?