Abstract:For infinite-horizon average-cost criterion problems, there exist relatively few rigorous approximation and reinforcement learning results. In this paper, for such problems, we present several approximation and reinforcement learning results for Markov Decision Processes with standard Borel spaces. Toward this end, (i) we first provide a discretization based approximation method for fully observed Markov Decision Processes (MDPs) with continuous spaces under average cost criteria, and we provide error bounds for the approximations when the dynamics are only weakly continuous (for asymptotic convergence of errors as the grid sizes vanish) or Wasserstein continuous (with a rate in approximation as the grid sizes vanish) under certain ergodicity assumptions. In particular, we relax the total variation condition given in prior work to weak continuity as well as Wasserstein continuity conditions. (ii) We provide synchronous and asynchronous Q-learning algorithms for continuous spaces via quantization (where the quantized state is taken to be the actual state in corresponding Q-learning algorithms presented in the paper), and establish their convergence; for the former we utilize a span semi-norm approach and for the latter we use a direct contraction approach. (iii) We finally show that the convergence is to the optimal Q values of the finite approximate models constructed via quantization, which implies near optimality of the arrived solution. Our Q-learning convergence results and their convergence to near optimality are new for continuous spaces, and the proof method is new even for finite spaces, to our knowledge.

Q-Learning for Continuous State and Action MDPs under Average Cost Criteria

Q-Learning for MDPs with General Spaces: Convergence and Near Optimality via Quantization under Weak Continuity

Relative Q-Learning for Average-Reward Markov Decision Processes with Continuous States

Approximate Q-Learning for Controlled Diffusion Processes and its Near Optimality

A Q-learning algorithm for Markov decision processes with continuous state spaces

Approximate Kalman Filter Q-Learning for Continuous State-Space MDPs

Q-Learning for Stochastic Control under General Information Structures and Non-Markovian Environments

Approximation Schemes for POMPDs with Continuous Spaces and Their Near Optimality

Decentralised Q-Learning for Multi-Agent Markov Decision Processes with a Satisfiability Criterion

Minimax Optimal Q Learning with Nearest Neighbors

Regularized Q-Learning with Linear Function Approximation

On Convergence of Average-Reward Q-Learning in Weakly Communicating Markov Decision Processes

Average Cost Optimality of Partially Observed MDPS: Contraction of Non-linear Filters, Optimal Solutions and Approximations

A kind of weighted Q-learning for continuous state and action spaces

How to discretize continuous state-action spaces in Q-learning: A symbolic control approach

Average Cost Optimal Control of Stochastic Systems Using Reinforcement Learning

A Novel Q-Learning Approach with Continuous States and Actions

Reinforcement Learning for Multi-Objective and Constrained Markov Decision Processes

Continuous-time q-learning for mean-field control problems

Average Cost Optimality of Partially Observed MDPs: Contraction of Nonlinear Filters and Existence of Optimal Solutions and Approximations

On Linear Programming for Constrained and Unconstrained Average-Cost Markov Decision Processes with Countable Action Spaces and Strictly Unbounded Costs