Abstract:Models of many real-life applications, such as queuing models of communication networks or computing systems, have a countably infinite state-space. Algorithmic and learning procedures that have been developed to produce optimal policies mainly focus on finite state settings, and do not directly apply to these models. To overcome this lacuna, in this work we study the problem of optimal control of a family of discrete-time countable state-space Markov Decision Processes (MDPs) governed by an unknown parameter $\theta\in\Theta$, and defined on a countably-infinite state space $\mathcal X=\mathbb{Z}_+^d$, with finite action space $\mathcal A$, and an unbounded cost function. We take a Bayesian perspective with the random unknown parameter $\boldsymbol{\theta}^*$ generated via a given fixed prior distribution on $\Theta$. To optimally control the unknown MDP, we propose an algorithm based on Thompson sampling with dynamically-sized episodes: at the beginning of each episode, the posterior distribution formed via Bayes' rule is used to produce a parameter estimate, which then decides the policy applied during the episode. To ensure the stability of the Markov chain obtained by following the policy chosen for each parameter, we impose ergodicity assumptions. From this condition and using the solution of the average cost Bellman equation, we establish an $\tilde O(dh^d\sqrt{|\mathcal A|T})$ upper bound on the Bayesian regret of our algorithm, where $T$ is the time-horizon. Finally, to elucidate the applicability of our algorithm, we consider two different queuing models with unknown dynamics, and show that our algorithm can be applied to develop approximately optimal control algorithms.

On solving optimal policies for event-based dynamic programming

On Solving Optimal Policies for Finite-Stage Event-Based Optimization

A rollout method for finite-stage event-based decision processes

On Solving Event-Based Optimization with Average Reward over Infinite Stages

Event-based optimization with lagged state information

Optimal Policies for Quantum Markov Decision Processes

On Multi-Scale Event-Based Optimization

A Potential-Based Method for Finite-Stage Markov Decision Process

Approximate Constrained Discounted Dynamic Programming with Uniform Feasibility and Optimality

Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space

Event-based optimization for finite-horizon total-cost markov decision processes

Model-free Adaptive Dynamic Programming for Optimal Control of Discrete-time Affine Nonlinear System

Optimal control of probabilistic discrete event systems on Markov decision processes

Dynamic Programming for Structured Continuous Markov Decision Problems

Online Abstract Dynamic Programming with Contractive Models

PODDP: Partially Observable Differential Dynamic Programming for Latent Belief Space Planning

Economic Model Predictive Control as a Solution to Markov Decision Processes

On the Performance Bounds of some Policy Search Dynamic Programming Algorithms

Solving Mission-Wide Chance-Constrained Optimal Control Using Dynamic Programming

Event-Based Optimization For Dispatching Policies In Material Handling Systems Of General Assembly Lines

A safe exploration approach to constrained Markov decision processes