Abstract:We present an alternative view for the study of optimal control of partially observed Markov Decision Processes (POMDPs). We first revisit the traditional (and by now standard) separated-design method of reducing the problem to fully observed MDPs (belief-MDPs), and present conditions for the existence of optimal policies. Then, rather than working with this standard method, we define a Markov chain taking values in an infinite dimensional product space with control actions and the state process causally conditionally independent given the measurement/information process. We provide new sufficient conditions for the existence of optimal control policies. In particular, while in the belief-MDP reduction of POMDPs, weak Feller condition requirement imposes total variation continuity on either the system kernel or the measurement kernel, with the approach of this paper only weak continuity of both the transition kernel and the measurement kernel is needed (and total variation continuity is not) together with regularity conditions related to filter stability. For the average cost setup, we provide a general approach on how to initialize the randomness which we show to establish convergence to optimal cost. For the discounted cost setup, we establish near optimality of finite window policies via a direct argument involving near optimality of quantized approximations for MDPs under weak Feller continuity, where finite truncations of memory can be viewed as quantizations of infinite memory with a uniform diameter in each finite window restriction under the product metric. In the control-free case, our analysis leads to new and weak conditions for the existence and uniqueness of invariant probability measures for non-linear filter processes, where we show that unique ergodicity of the measurement process and a measurability condition related to filter stability leads to unique ergodicity.

Algorithms for optimization and stabilization of controlled Markov chains

A policy iteration algorithm for non-Markovian control problems

Average cost optimal control under weak ergodicity hypotheses: Relative value iterations

Maximal reliability of controlled Markov systems

Optimization Algorithms for Semi-Markov Control Processes with Average Criteria

Policy Iteration Approach to the Infinite Horizon Average Optimal Control of Probabilistic Boolean Networks

Stability Analysis of Optimal Adaptive Control Under Value Iteration Using a Stabilizing Initial Policy

Robust Policy Optimization in Continuous-time Mixed $\mathcal{H}_2/\mathcal{H}_\infty$ Stochastic Control

Inverse optimal stabilization of cooperative control in networked multi-agent systems

Stochastic control up to a hitting time: optimality and rolling-horizon implementation

A Multilevel Approach for Stochastic Nonlinear Optimal Control

Partially Observed Optimal Stochastic Control: Regularity, Optimality, Approximations, and Learning

Regress-Later Monte Carlo for optimal control of Markov processes

Approximate Policy Iteration for Robust Stochastic Control of Multi-agent Markov Decision Processes

From Optimization to Control: Quasi Policy Iteration

Another Look at Partially Observed Optimal Stochastic Control: Existence, Ergodicity, and Approximations without Belief-Reduction

Policy iteration for discrete-time systems with discounted costs: stability and near-optimality guarantees

Stability Analysis of Optimal Adaptive Control using Value Iteration with Approximation Errors

Risk-sensitive control of continuous time Markov chains

Optimal control formulation of transition path problems for Markov Jump Processes

Optimal Control of a Stochastic Power System -- Algorithms and Mathematical Analysis