Abstract:We present an alternative view for the study of optimal control of partially observed Markov Decision Processes (POMDPs). We first revisit the traditional (and by now standard) separated-design method of reducing the problem to fully observed MDPs (belief-MDPs), and present conditions for the existence of optimal policies. Then, rather than working with this standard method, we define a Markov chain taking values in an infinite dimensional product space with control actions and the state process causally conditionally independent given the measurement/information process. We provide new sufficient conditions for the existence of optimal control policies. In particular, while in the belief-MDP reduction of POMDPs, weak Feller condition requirement imposes total variation continuity on either the system kernel or the measurement kernel, with the approach of this paper only weak continuity of both the transition kernel and the measurement kernel is needed (and total variation continuity is not) together with regularity conditions related to filter stability. For the average cost setup, we provide a general approach on how to initialize the randomness which we show to establish convergence to optimal cost. For the discounted cost setup, we establish near optimality of finite window policies via a direct argument involving near optimality of quantized approximations for MDPs under weak Feller continuity, where finite truncations of memory can be viewed as quantizations of infinite memory with a uniform diameter in each finite window restriction under the product metric. In the control-free case, our analysis leads to new and weak conditions for the existence and uniqueness of invariant probability measures for non-linear filter processes, where we show that unique ergodicity of the measurement process and a measurability condition related to filter stability leads to unique ergodicity.

Markov Decision Processes with Incomplete Information and Semi-Uniform Feller Transition Probabilities

Another Look at Partially Observed Optimal Stochastic Control: Existence, Ergodicity, and Approximations without Belief-Reduction

Partially Observed Optimal Stochastic Control: Regularity, Optimality, Approximations, and Learning

Mixed Markov Decision Processes in a Semi-Markov Environment with Discounted Criterion

Control Theory Meets POMDPs: A Hybrid Systems Approach

Optimal control of probabilistic discrete event systems on Markov decision processes

Intermittently Observable Markov Decision Processes

Controlled Markov Processes With Safety State Constraints

Mean Field Markov Decision Processes

Continuous Time Markov Decision Processes with Nonuniformly Bounded Transition Rate: Expected Total Rewards

Continuous time Stochastic optimal control under discrete time partial observations

Markov Decision Problems with Unbounded Transition Rates under Discounted-Cost Performance Criteria

Controlled Diffusions under Full, Partial and Decentralized Information: Existence of Optimal Policies and Discrete-Time Approximations

Cascade Markov Decision Processes: Theory and Applications

Approximate Control for Continuous-Time POMDPs

Relaxed Equilibria for Time-Inconsistent Markov Decision Processes

Sufficiency of Markov Policies for Continuous-Time Jump Markov Decision Processes

Explainable Finite-Memory Policies for Partially Observable Markov Decision Processes

A Possibilistic Model for Qualitative Sequential Decision Problems under Uncertainty in Partially Observable Environments

Point-Based Methods for Model Checking in Partially Observable Markov Decision Processes

Markov Decision Processes with Observation Costs: Framework and Computation with a Penalty Scheme