Abstract:Models of many real-life applications, such as queuing models of communication networks or computing systems, have a countably infinite state-space. Algorithmic and learning procedures that have been developed to produce optimal policies mainly focus on finite state settings, and do not directly apply to these models. To overcome this lacuna, in this work we study the problem of optimal control of a family of discrete-time countable state-space Markov Decision Processes (MDPs) governed by an unknown parameter $\theta\in\Theta$, and defined on a countably-infinite state space $\mathcal X=\mathbb{Z}_+^d$, with finite action space $\mathcal A$, and an unbounded cost function. We take a Bayesian perspective with the random unknown parameter $\boldsymbol{\theta}^*$ generated via a given fixed prior distribution on $\Theta$. To optimally control the unknown MDP, we propose an algorithm based on Thompson sampling with dynamically-sized episodes: at the beginning of each episode, the posterior distribution formed via Bayes' rule is used to produce a parameter estimate, which then decides the policy applied during the episode. To ensure the stability of the Markov chain obtained by following the policy chosen for each parameter, we impose ergodicity assumptions. From this condition and using the solution of the average cost Bellman equation, we establish an $\tilde O(dh^d\sqrt{|\mathcal A|T})$ upper bound on the Bayesian regret of our algorithm, where $T$ is the time-horizon. Finally, to elucidate the applicability of our algorithm, we consider two different queuing models with unknown dynamics, and show that our algorithm can be applied to develop approximately optimal control algorithms.

A Note on the Existence of Optimal Stationary Policies for Average Markov Decision Processes with Countable States

A New Condition for the Existence of Optimal Stationary Policies in Denumerable State Average Cost Continuous Time Markov Decision Processes with Unbounded Cost and Transition Rates

Optimal Stationary Policies for a Class of Countable Markov Control Processes

On Average Optimality for Non-Stationary Markov Decision Processes in Borel Spaces

Average Optimality in Markov Decision Processes with Unbounded Rewards

Performance Bounds and Asymptotic Optimality of Modified (r, Q) Policies for Stochastic Distribution Inventory Systems

On the optimality equation for average cost Markov decision processes and its validity for inventory control

Average cost optimal control under weak ergodicity hypotheses: Relative value iterations

Solution to the risk-sensitive average cost optimality equation in a class of Markov decision processes with finite state space

Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space

On the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of $(s,S)$ Inventory Policies

Optimal Policies for a Continuous Time MCP with Compact Action Set

The Finiteness of the Reward Function and the Optimal Value Function in Markov Decision Processes

On Linear Programming for Constrained and Unconstrained Average-Cost Markov Decision Processes with Countable Action Spaces and Strictly Unbounded Costs

Risk-Sensitive Average Markov Decision Processes in General Spaces

Optimal Sample Complexity for Average Reward Markov Decision Processes

Finding Optimal Observation-Based Policies for Constrained POMDPs under the Expected Average Reward Criterion

The minimal hitting probability of continuous-time controlled Markov systems with countable states

Markov Decision Problems with Unbounded Transition Rates under Discounted-Cost Performance Criteria

Optimal Stationary Policies for Semi-Markov Control Processes with Discounted-Cost Criteria

Sufficiency of Markov Policies for Continuous-Time Jump Markov Decision Processes