Abstract:In this review/tutorial article, we present recent progress on optimal control of partially observed Markov Decision Processes (POMDPs). We first present regularity and continuity conditions for POMDPs and their belief-MDP reductions, where these constitute weak Feller and Wasserstein regularity and controlled filter stability. These are then utilized to arrive at existence results on optimal policies for both discounted and average cost problems, and regularity of value functions. Then, we study rigorous approximation results involving quantization based finite model approximations as well as finite window approximations under controlled filter stability. Finally, we present several recent reinforcement learning theoretic results which rigorously establish convergence to near optimality under both criteria.

What problem does this paper attempt to address?

This paper attempts to solve the optimal stochastic control problem in partially observable Markov decision processes (POMDPs). Specifically, the paper focuses on the following aspects: 1. **Regularity and Optimality**: - The paper first explores the regularity and continuity conditions of POMDPs and their belief MDP simplified forms, including the weak Feller property, Wasserstein regularity, and controlled filter stability. - These properties are used to prove the existence of optimal policies in the discounted - cost and average - cost problems, and the regularity of the value function. 2. **Approximation Methods**: - The paper studies the strict approximation results based on quantized finite - model approximation and the results of finite - window approximation under the condition of controlled filter stability. 3. **Reinforcement Learning Theory**: - The paper introduces several recent reinforcement learning theory results, which strictly establish the conclusion of convergence to near - optimality under two cost criteria. ### Mathematical Formula Summary - **State Transition Equation**: \[ X_{k + 1}=F(X_k,U_k,W_k)\quad(1) \] - **Measurement Equation**: \[ Y_k = G(X_k,V_k)\quad(2) \] - **Initial Distribution**: \[ X_0\sim\mu\quad\text{where}\ \mu\in P(X) \] - **Control/Decision Process**: \[ U_k=\gamma_k(I_k),\quad k\in\mathbb{Z}^+ \] where \(I_k = \{Y[0,k],U[0,k - 1]\}\) - **Objective Function**: - **Discounted - Cost Criterion**: \[ J_\beta(\mu,\gamma)=E^\gamma_\mu\left[\sum_{k = 0}^\infty\beta^k c(X_k,U_k)\right]\quad(5) \] - **Average - Cost Criterion**: \[ J_\infty(\mu,\gamma)=\limsup_{N\rightarrow\infty}\frac{1}{N}E^\gamma_\mu\left[\sum_{k = 0}^{N - 1}c(X_k,U_k)\right]\quad(4) \] - **State Transition Kernel of Belief MDP**: \[ \eta(\cdot|\pi,u)=\int_Y 1\{F(\pi,u,y)\in\cdot\}H(dy|\pi,u)\quad(6) \] - **One - Stage Cost Function of Belief MDP**: \[ \tilde{c}(\pi,u)=\int_X c(x,u)\pi(dx)\quad(7) \] ### Key Contributions - **Regularity Results**: Proved the weak Feller property, Wasserstein continuity, and contraction property of the belief MDP. - **Existence of Optimal Policies**: Under the discounted - cost and average - cost criteria, proved the existence of optimal policies and gave the regularity conditions of the value function. - **Approximation Methods**: Provided methods of quantized finite - model approximation and finite - window approximation and analyzed their performance bounds. - **Reinforcement Learning Theory**: Through strict mathematical derivations, proved that under the discounted - cost and average - cost criteria, the reinforcement learning algorithm can converge to a near - optimal solution. In conclusion, this paper provides a comprehensive solution and theoretical support for the optimal control problem in POMDPs through rigorous mathematical analysis and theoretical derivations.

Partially Observed Optimal Stochastic Control: Regularity, Optimality, Approximations, and Learning

Another Look at Partially Observed Optimal Stochastic Control: Existence, Ergodicity, and Approximations without Belief-Reduction

On Stochastic Optimal Control of Partially Observable Nonlinear Quasi Hamiltonian Systems

Average Cost Optimality of Partially Observed MDPs: Contraction of Nonlinear Filters and Existence of Optimal Solutions and Approximations

Control Theory Meets POMDPs: A Hybrid Systems Approach

Discrete-Time Approximation of Stochastic Optimal Control with Partial Observation

Average Cost Optimality of Partially Observed MDPS: Contraction of Non-linear Filters, Optimal Solutions and Approximations

A stochastic optimal control strategy for partially observable nonlinear systems

General Necessary Conditions for Partially Observed Optimal Stochastic Controls

Optimal Control of Logically Constrained Partially Observable and Multi-Agent Markov Decision Processes

Continuous time Stochastic optimal control under discrete time partial observations

Stochastic Optimal Vibration Control of Partially Observable Nonlinear Quasi Hamiltonian Systems with Actuator Saturation

A Minimax Optimal Control Strategy for Partially Observable Uncertain Quasi-Hamiltonian Systems

Controlled Diffusions under Full, Partial and Decentralized Information: Existence of Optimal Policies and Discrete-Time Approximations

Refined Bounds on Near Optimality Finite Window Policies in POMDPs and Their Reinforcement Learning

Approximate Control for Continuous-Time POMDPs

Stochastic Finite State Control of POMDPs with LTL Specifications

Stochastic maximum principle for hybrid optimal control problems under partial observation

Robustness of Stochastic Optimal Control to Approximate Diffusion Models under Several Cost Evaluation Criteria

Markov Decision Processes with Incomplete Information and Semi-Uniform Feller Transition Probabilities

Robustness to Model Approximation, Empirical Model Learning, and Sample Complexity in Wasserstein Regular MDPs