Variance-Aware Sparse Linear Bandits.

Yan Dai,Ruosong Wang,Simon S. Du
DOI: https://doi.org/10.48550/arxiv.2205.13450
2022-01-01
Abstract:It is well-known that for sparse linear bandits, when ignoring the dependency on sparsity which is much smaller than the ambient dimension, the worst-case minimax regret is Θ(√(dT)) where d is the ambient dimension and T is the number of rounds. On the other hand, in the benign setting where there is no noise and the action set is the unit sphere, one can use divide-and-conquer to achieve 𝒪(1) regret, which is (nearly) independent of d and T. In this paper, we present the first variance-aware regret guarantee for sparse linear bandits: 𝒪(√(d∑_t=1^T σ_t^2) + 1), where σ_t^2 is the variance of the noise at the t-th round. This bound naturally interpolates the regret bounds for the worst-case constant-variance regime (i.e., σ_t ≡Ω(1)) and the benign deterministic regimes (i.e., σ_t ≡ 0). To achieve this variance-aware regret guarantee, we develop a general framework that converts any variance-aware linear bandit algorithm to a variance-aware algorithm for sparse linear bandits in a "black-box" manner. Specifically, we take two recent algorithms as black boxes to illustrate that the claimed bounds indeed hold, where the first algorithm can handle unknown-variance cases and the second one is more efficient.
What problem does this paper attempt to address?