A Simple Approach For Non-Stationary Linear Bandits

Peng Zhao,Lijun Zhang,Yuan Jiang,Zhi-Hua Zhou
DOI: https://doi.org/10.48550/arxiv.2103.05324
2020-01-01
Abstract:This paper investigates the problem of non-stationary linear bandits, where the unknown regression parameter is evolving over time. Previous studies have adopted sophisticated mechanisms, such as sliding window or weighted penalty to achieve near-optimal dynamic regret. In this paper, we demonstrate that a simple restarted strategy is sufficient to attain the same regret guarantee. Specifically, we design an UCB-type algorithm to balance exploitation and exploration, and restart it periodically to handle the drift of unknown parameters. Let T be the time horizon, d be the dimension, and P-T be the path-length that measures the fluctuation of the evolving unknown parameter, our approach enjoys an (O) over tilde (d(2/3)(1 + P-T)T-1/3(2/3)) dynamic regret, which is nearly optimal, matching the Omega(d(2/3)(1+P-T)T-1/3(2/3)) minimax lower bound up to logarithmic factors. Empirical studies also validate the efficacy of our approach.
What problem does this paper attempt to address?