Abstract:Partially observable Markov decision processes (POMDPs) provide a modeling framework for a variety of sequential decision making under uncertainty scenarios in artificial intelligence (AI). Since the states are not directly observable in a POMDP, decision making has to be performed based on the output of a Bayesian filter (continuous beliefs). Hence, POMDPs are often computationally intractable to solve exactly and researchers resort to approximate methods often using discretizations of the continuous belief space. These approximate solutions are, however, prone to discretization errors, which has made POMDPs ineffective in applications, wherein guarantees for safety, optimality, or performance are required. To overcome the complexity challenge of POMDPs, we apply notions from control theory. The goal is to determine the reachable belief space of a POMDP, that is, the set of all possible evolutions given an initial belief distribution over the states and a set of actions and observations. We begin by casting the problem of analyzing a POMDP into analyzing the behavior of a discrete-time switched system. For estimating the reachable belief space, we find over-approximations in terms of sub-level sets of Lyapunov functions. Furthermore, in order to verify safety and optimality requirements of a given POMDP, we formulate a barrier certificate theorem, wherein we show that if there exists a barrier certificate satisfying a set of inequalities along with the belief update equation of the POMDP, the safety and optimality properties are guaranteed to hold. In both cases, we show how the calculations can be decomposed into smaller problems that can be solved in parallel. The conditions we formulate can be computationally implemented as a set of sum-of-squares programs. We illustrate the applicability of our method by addressing two problems in active ad scheduling and machine teaching.

Partially Observable Markov Decision Processes and Performance Sensitivity Analysis

Observation-Based Performance Sensitivity Analysis for Pomdps

Finding Optimal Memoryless Policies of POMDPs under the Expected Average Reward Criterion

Finding Optimal Observation-Based Policies for Constrained POMDPs under the Expected Average Reward Criterion

Online Probabilistic Assessment of Operating Performance Based on Safety and Optimality Indices for Multimode Industrial Processes

Performance Optimization of Semi-Markov Decision Processes with Discounted-cost Criteria.

Sensitivity Analysis and Performance Optimization of Semi-Markov Processes Based on Performance Potentials

Markov Decision Processes under Risk Sensitivity: A Discount Vanishing Approach

Performance Optimization for Countable Semi-Markov Decision Processes with Discounted-cost

A Sensitivity‐Based Construction Approach to Variance Minimization of Markov Decision Processes

Error bounds of optimization algorithms for semi-Markov decision processes

Risk‐Sensitive Markov Decision Processes with Combined Metrics of Mean and Variance

Observation-Based Optimization for POMDPs with Continuous State, Observation, and Action Spaces.

Sensitivity Analysis and Performance Optimization of a Class of Continuous Time Markov Chains

Sensitivity-Based Mnemonic Enhancement Optimization (S-Meo) For Real-Time Optimization Of Chemical Process

Partially Observed Optimal Stochastic Control: Regularity, Optimality, Approximations, and Learning

Simulation Optimization Algorithm for SMDPs with Parameterized Randomized Stationary Policies

The Optimal Robust Control Policy for Uncertain Semi-Markov Control Processes

Centralized Optimization for Dec-POMDPs under the Expected Average Reward Criterion

Event-based optimization for finite-horizon total-cost markov decision processes

Control Theory Meets POMDPs: A Hybrid Systems Approach