Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits

Jiatai Huang,Yan Dai,Longbo Huang
2022-01-01
Abstract:In this paper, we generalize the concept of heavy-tailed multi-armed bandits to adversarial environments, and develop robust best-of-both-worlds algorithms for heavy-tailed multi-armed bandits (MAB), where losses have alpha-th (1 < alpha <= 2) moments bounded by sigma(alpha), while the variances may not exist. Specifically, we design an algorithm HTINF, when the heavy-tail parameters alpha and sigma are known to the agent, HTINF simultaneously achieves the optimal regret for both stochastic and adversarial environments, without knowing the actual environment type a-priori. When alpha, sigma are unknown, HTINF achieves a log T -style instancedependent regret in stochastic cases and o(T) noregret guarantee in adversarial cases. We further develop an algorithm AdaTINF, achieving O (sigma(K1-1/alpha T1/alpha)) minimax optimal regret even in adversarial settings, without prior knowledge on alpha and sigma. This result matches the known regret lower-bound (Bubeck et al., 2013), which assumed a stochastic environment and alpha and sigma are both known. To our knowledge, the proposed HTINF algorithm is the first to enjoy a best-of-both-worlds regret guarantee, and AdaTINF is the first algorithm that can adapt to both alpha and sigma to achieve optimal gap-indepedent regret bound in classical heavy-tailed stochastic MAB setting and our novel adversarial formulation.
What problem does this paper attempt to address?