Abstract:Partially observable Markov decision processes (POMDPs) provide a modeling framework for a variety of sequential decision making under uncertainty scenarios in artificial intelligence (AI). Since the states are not directly observable in a POMDP, decision making has to be performed based on the output of a Bayesian filter (continuous beliefs). Hence, POMDPs are often computationally intractable to solve exactly and researchers resort to approximate methods often using discretizations of the continuous belief space. These approximate solutions are, however, prone to discretization errors, which has made POMDPs ineffective in applications, wherein guarantees for safety, optimality, or performance are required. To overcome the complexity challenge of POMDPs, we apply notions from control theory. The goal is to determine the reachable belief space of a POMDP, that is, the set of all possible evolutions given an initial belief distribution over the states and a set of actions and observations. We begin by casting the problem of analyzing a POMDP into analyzing the behavior of a discrete-time switched system. For estimating the reachable belief space, we find over-approximations in terms of sub-level sets of Lyapunov functions. Furthermore, in order to verify safety and optimality requirements of a given POMDP, we formulate a barrier certificate theorem, wherein we show that if there exists a barrier certificate satisfying a set of inequalities along with the belief update equation of the POMDP, the safety and optimality properties are guaranteed to hold. In both cases, we show how the calculations can be decomposed into smaller problems that can be solved in parallel. The conditions we formulate can be computationally implemented as a set of sum-of-squares programs. We illustrate the applicability of our method by addressing two problems in active ad scheduling and machine teaching.

Quantum Markov Decision Processes: General Theory, Approximations, and Classes of Policies

Quantum Markov Decision Processes Part II: Optimal Solutions and Algorithms

Optimal Policies for Quantum Markov Decision Processes

On the design and analysis of near-term quantum network protocols using Markov decision processes

Reachability Analysis of Quantum Markov Decision Processes.

Decision-Making Of Shortest Path In Quantum State Population Transfer Based On Markov Decision Processes

On Quantum Algorithms for Efficient Solutions of General Classes of Structured Markov Processes

Information Processing by Networks of Quantum Decision Makers

Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space

On the convergence of projective-simulation-based reinforcement learning in Markov decision processes

Quantum Markov chains: Description of hybrid systems, decidability of equivalence, and model checking linear-time properties

A model of adaptive decision making from representation of information environment by quantum fields

Quantum policy gradient algorithms

Approximation Schemes for POMPDs with Continuous Spaces and Their Near Optimality

Control Theory Meets POMDPs: A Hybrid Systems Approach

Quantum Speedups in Regret Analysis of Infinite Horizon Average-Reward Markov Decision Processes

Markov Decision Processes with Incomplete Information and Semi-Uniform Feller Transition Probabilities

The Quantum Advantage in Decentralized Control

Quantum decision theory as quantum theory of measurement

Concepts of quantum non-Markovianity: A hierarchy

Mixed Markov Decision Processes in a Semi-Markov Environment with Discounted Criterion