Abstract:We present an alternative view for the study of optimal control of partially observed Markov Decision Processes (POMDPs). We first revisit the traditional (and by now standard) separated-design method of reducing the problem to fully observed MDPs (belief-MDPs), and present conditions for the existence of optimal policies. Then, rather than working with this standard method, we define a Markov chain taking values in an infinite dimensional product space with control actions and the state process causally conditionally independent given the measurement/information process. We provide new sufficient conditions for the existence of optimal control policies. In particular, while in the belief-MDP reduction of POMDPs, weak Feller condition requirement imposes total variation continuity on either the system kernel or the measurement kernel, with the approach of this paper only weak continuity of both the transition kernel and the measurement kernel is needed (and total variation continuity is not) together with regularity conditions related to filter stability. For the average cost setup, we provide a general approach on how to initialize the randomness which we show to establish convergence to optimal cost. For the discounted cost setup, we establish near optimality of finite window policies via a direct argument involving near optimality of quantized approximations for MDPs under weak Feller continuity, where finite truncations of memory can be viewed as quantizations of infinite memory with a uniform diameter in each finite window restriction under the product metric. In the control-free case, our analysis leads to new and weak conditions for the existence and uniqueness of invariant probability measures for non-linear filter processes, where we show that unique ergodicity of the measurement process and a measurability condition related to filter stability leads to unique ergodicity.

Isomorphism Properties of Optimality and Equilibrium Solutions under Equivalent Information Structure Transformations I: Stochastic Dynamic Teams

Isomorphism Properties of Optimality and Equilibrium Solutions under Equivalent Information Structure Transformations: Stochastic Dynamic Games and Teams

Dynamic Team Theory of Stochastic Differential Decision Systems with Decentralized Noisy Information Structures via Girsanov's Measure Transformation

Decentralized Exchangeable Stochastic Dynamic Teams in Continuous-time, their Mean-Field Limits and Optimality of Symmetric Policies

Optimality of Decentralized Symmetric Policies for Stochastic Teams with Mean-Field Information Sharing

Common Information Approach for Static Team Problems with Polish Spaces and Existence of Optimal Policies

Optimal Control of Robust Team Stochastic Games

Controlled Diffusions under Full, Partial and Decentralized Information: Existence of Optimal Policies and Discrete-Time Approximations

A Unified Approach to Dynamic Decision Problems with Asymmetric Information - Part I: Non-Strategic Agents

Information Compression in Dynamic Games

Dynamic Games among Teams with Delayed Intra-Team Information Sharing

Deep Structured Teams with Linear Quadratic Model: Partial Equivariance and Gauge Transformation

Nash Equilibria for Exchangeable Team-Against-Team Games, Their Mean-Field Limit, and the Role of Common Randomness

Information Relaxation and A Duality-Driven Algorithm for Stochastic Dynamic Programs

Another Look at Partially Observed Optimal Stochastic Control: Existence, Ergodicity, and Approximations without Belief-Reduction

Robust optimal policies for team Markov games

Subgame-perfect equilibrium strategies for time-inconsistent recursive stochastic control problems

Zero-Sum Games involving Teams against Teams: Existence of Equilibria, and Comparison and Regularity in Information

On Strategic Measures and Optimality Properties in Discrete-Time Stochastic Control with Universally Measurable Policies

On Borkar and Young Relaxed Control Topologies and Continuous Dependence of Invariant Measures on Control Policy

Opinion Dynamic Games under One Step Ahead Optimal Control