Abstract:We consider finite-horizon restless bandits with multiple pulls per period, which play an important role in recommender systems, active learning, revenue management, and many other areas. While an optimal policy can be computed, in principle, using dynamic programming, the computation required scales exponentially in the number of arms $N$. Thus, there is substantial value in understanding the performance of index policies and other policies that can be computed efficiently for large $N$. We study the growth of the optimality gap, i.e., the loss in expected performance compared to an optimal policy, for such policies in a classical asymptotic regime proposed by Whittle in which $N$ grows while holding constant the fraction of arms that can be pulled per period. Intuition from the Central Limit Theorem and the tightest previous theoretical bounds suggest that this optimality gap should grow like $O(\sqrt{N})$. Surprisingly, we show that it is possible to outperform this bound. We characterize a non-degeneracy condition and a wide class of novel practically-computable policies, called fluid-priority policies, in which the optimality gap is $O(1)$. These include most widely-used index policies. When this non-degeneracy condition does not hold, we show that fluid-priority policies nevertheless have an optimality gap that is $O(\sqrt{N})$, significantly generalizing the class of policies for which convergence rates are known. We demonstrate that fluid-priority policies offer state-of-the-art performance on a collection of restless bandit problems in numerical experiments.

An Optimal-Control Approach to Infinite-Horizon Restless Bandits: Achieving Asymptotic Optimality with Minimal Assumptions

Model Predictive Control is Almost Optimal for Restless Bandit

Achieving Exponential Asymptotic Optimality in Average-Reward Restless Bandits without Global Attractor Assumption

Optimal Control for Constrained Discrete-Time Nonlinear Systems Based on Safe Reinforcement Learning.

LP-based policies for restless bandits: necessary and sufficient conditions for (exponentially fast) asymptotic optimality

Restless Bandits with Average Reward: Breaking the Uniform Global Attractor Assumption

Near-optimality for infinite-horizon restless bandits with many arms

Unichain and Aperiodicity are Sufficient for Asymptotic Optimality of Average-Reward Restless Bandits

Low-Complexity Algorithm for Restless Bandits with Imperfect Observations

Optimality of Myopic Policy for Restless Multiarmed Bandit with Imperfect Observation

Model-free Adaptive Dynamic Programming for Optimal Control of Discrete-time Affine Nonlinear System

Optimal Myopic Policy for Restless Bandit: A Perspective of Eigendecomposition

Reinforcement Learning Augmented Asymptotically Optimal Index Policy for Finite-Horizon Restless Bandits.

Non-stationary Bandits with Habituation and Recovery Dynamics and Knapsack Constraints

Optimal Data Driven Resource Allocation under Multi-Armed Bandit Observations

Multi-Action Restless Bandits with Weakly Coupled Constraints: Simultaneous Learning and Control

Asymptotically Optimal Policies for Weakly Coupled Markov Decision Processes

Achieving O(1/N) Optimality Gap in Restless Bandits through Diffusion Approximation

Restless Bandits with Many Arms: Beating the Central Limit Theorem

Optimal Control of a Levy Inventory System: The Optimality of Control Band Policy

Planning and Learning in Risk-Aware Restless Multi-Arm Bandit Problem