Abstract:We study a general Markov game with metric switching costs: in each round, the player adaptively chooses one of several Markov chains to advance with the objective of minimizing the expected cost for at least k chains to reach their target states. If the player decides to play a di erent chain, an additional switching cost is incurred. e special case in which there is no switching cost was solved optimally by Dumitriu, Tetali and Winkler [DTW03] by a variant of the celebrated Gi ins Index for the classical multi-armed bandit (MAB) problem with Markovian rewards [Git74, Git79]. However, for multi-armed bandit (MAB) with nontrivial switching cost, even if the switching cost is a constant, the classic paper by Banks and Sundaram [BS94] showed that no index strategy can be optimal. 1 In this paper, we complement their result and show there is a simple index strategy that achieves a constant approximation factor if the switching cost is constant and k = 1. To the best of our knowledge, this is the rst index strategy that achieves a constant approximation factor for a general MAB variant with switching costs. For the general metric, we propose a more involved constant-factor approximation algorithm, via an nontrivial reduction to the stochastic k-TSP problem, in which a Markov chain is approximated by a random variable. Our analysis makes extensive use of various interesting properties of Gi ins index. ∗Institute for Interdisciplinary Information Sciences, TsinghuaUniversity. Email:lijian83@mail.tsinghua.edu.cn. †Paul G. Allen School of Computer Science & Engineering, University of Washington. Part of work was done while visiting Shanghai Qi Zhi Institute. Email: dgliu@cs.washington.edu. 1 eir proof is for the discounted version of MAB, but can be extended to our se ing. See Appendix D for the details. ar X iv :2 10 7. 05 82 2v 1 [ cs .D S] 1 3 Ju l 2 02 1

A faster index algorithm and a computational study for bandits with switching costs

Multi-token Markov Game with Switching Costs

Markov Game with Switching Costs

A $(2/3)n^3$ fast-pivoting algorithm for the Gittins index and optimal stopping of a Markov chain

Dynamic priority allocation via restless bandit marginal productivity indices

A unifying computations of Whittle's Index for Markovian bandits

Open Bandit Processes with Uncountable States and Time-Backward Effects

A General Theory of MultiArmed Bandit Processes with Constrained Arm Switches

Two-Armed Restless Bandits with Imperfect Information: Stochastic Control and Indexability

A General Framework of Multi-Armed Bandit Processes by Arm Switch Restrictions

Multinomial Logit Bandit with Low Switching Cost

Collapsing Bandits and Their Application to Public Health Interventions

GINO-Q: Learning an Asymptotically Optimal Index Policy for Restless Multi-armed Bandits

Empirical Gittins index strategies with ?-explorations for multi-armed bandit problems

Bandits with Switching Costs: T^{2/3} Regret.

Empirical Gittins Index Strategies with Ε-Explorations for Multi-Armed Bandit Problems

Testing Indexability and Computing Whittle and Gittins Index in Subcubic Time

Online Algorithms for the Multi-Armed Bandit Problem with Markovian Rewards

Cost-Aware Cascading Bandits

Indexability of Finite State Restless Multi-Armed Bandit and Rollout Policy

Adaptive Algorithm for Multi-Armed Bandit Problem with High-Dimensional Covariates