Abstract:Partially observable Markov decision processes (POMDPs) provide a modeling framework for a variety of sequential decision making under uncertainty scenarios in artificial intelligence (AI). Since the states are not directly observable in a POMDP, decision making has to be performed based on the output of a Bayesian filter (continuous beliefs). Hence, POMDPs are often computationally intractable to solve exactly and researchers resort to approximate methods often using discretizations of the continuous belief space. These approximate solutions are, however, prone to discretization errors, which has made POMDPs ineffective in applications, wherein guarantees for safety, optimality, or performance are required. To overcome the complexity challenge of POMDPs, we apply notions from control theory. The goal is to determine the reachable belief space of a POMDP, that is, the set of all possible evolutions given an initial belief distribution over the states and a set of actions and observations. We begin by casting the problem of analyzing a POMDP into analyzing the behavior of a discrete-time switched system. For estimating the reachable belief space, we find over-approximations in terms of sub-level sets of Lyapunov functions. Furthermore, in order to verify safety and optimality requirements of a given POMDP, we formulate a barrier certificate theorem, wherein we show that if there exists a barrier certificate satisfying a set of inequalities along with the belief update equation of the POMDP, the safety and optimality properties are guaranteed to hold. In both cases, we show how the calculations can be decomposed into smaller problems that can be solved in parallel. The conditions we formulate can be computationally implemented as a set of sum-of-squares programs. We illustrate the applicability of our method by addressing two problems in active ad scheduling and machine teaching.

Lecture Notes on Partially Known MDPs

MDPs with Unawareness

Reinforcement Learning with Partially Known World Dynamics

Learning for Decentralized Control of Multiagent Systems in Large, Partially-Observable Stochastic Environments

Structural Results for Partially Observed Markov Decision Processes

Reinforcement Learning in Partially Observable Markov Decision Processes using Hybrid Probabilistic Logic Programs

Prospective Side Information for Latent MDPs

Robust Anytime Learning of Markov Decision Processes

Planning with Partially Observable Markov Decision Processes: Advances in Exact Solution Method

Partially Observed Optimal Stochastic Control: Regularity, Optimality, Approximations, and Learning

Online Reinforcement Learning in Markov Decision Process Using Linear Programming

Control Theory Meets POMDPs: A Hybrid Systems Approach

Blackwell Online Learning for Markov Decision Processes

Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency

Near-Optimal Learning and Planning in Separated Latent MDPs

Learning Adversarial MDPs with Bandit Feedback and Unknown Transition

Solving Robust MDPs through No-Regret Dynamics

A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game

Reinforcement Learning with Temporal Logic Constraints for Partially-Observable Markov Decision Processes

Robust Batch Policy Learning in Markov Decision Processes

Robust Action Selection in Partially Observable Markov Decision Processes with Model Uncertainty