Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs
Yifan Lin,Yuhao Wang,Enlu Zhou
DOI: https://doi.org/10.1007/s11518-022-5541-9
2022-11-04
Journal of Systems Science and Systems Engineering
Abstract:In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion. At each round, contexts are revealed for each arm, and the decision maker chooses one arm to pull and receives the corresponding reward. In particular, we consider mean-variance as the risk criterion, and the best arm is the one with the largest mean-variance reward. We apply the Thompson sampling algorithm for the disjoint model, and provide a comprehensive regret analysis for a variant of the proposed algorithm. For T rounds, K actions, and d -dimensional feature vectors, we prove a regret bound of that holds with probability 1 − δ under the mean-variance criterion with risk tolerance ρ , for any . The empirical performance of our proposed algorithms is demonstrated via a portfolio selection problem.
operations research & management science