Tight Memory-Regret Lower Bounds for Streaming Bandits.

Shaoang Li,Lan Zhang,Junhao Wang,Xiang-Yang Li
DOI: https://doi.org/10.48550/arxiv.2306.07903
2023-01-01
Abstract:In this paper, we investigate the streaming bandits problem, wherein the learner aims to minimize regret by dealing with online arriving arms and sublinear arm memory. We establish the tight worst-case regret lower bound of Ω( (TB)^α K^1-α), α = 2^B / (2^B+1-1) for any algorithm with a time horizon T, number of arms K, and number of passes B. The result reveals a separation between the stochastic bandits problem in the classical centralized setting and the streaming setting with bounded arm memory. Notably, in comparison to the well-known Ω(√(KT)) lower bound, an additional double logarithmic factor is unavoidable for any streaming bandits algorithm with sublinear memory permitted. Furthermore, we establish the first instance-dependent lower bound of Ω(T^1/(B+1)∑_Δ_x>0μ^*/Δ_x) for streaming bandits. These lower bounds are derived through a unique reduction from the regret-minimization setting to the sample complexity analysis for a sequence of ϵ-optimal arms identification tasks, which maybe of independent interest. To complement the lower bound, we also provide a multi-pass algorithm that achieves a regret upper bound of Õ( (TB)^α K^1 - α) using constant arm memory.
What problem does this paper attempt to address?