Abstract:Partially observable Markov decision processes (POMDPs) provide a modeling framework for a variety of sequential decision making under uncertainty scenarios in artificial intelligence (AI). Since the states are not directly observable in a POMDP, decision making has to be performed based on the output of a Bayesian filter (continuous beliefs). Hence, POMDPs are often computationally intractable to solve exactly and researchers resort to approximate methods often using discretizations of the continuous belief space. These approximate solutions are, however, prone to discretization errors, which has made POMDPs ineffective in applications, wherein guarantees for safety, optimality, or performance are required. To overcome the complexity challenge of POMDPs, we apply notions from control theory. The goal is to determine the reachable belief space of a POMDP, that is, the set of all possible evolutions given an initial belief distribution over the states and a set of actions and observations. We begin by casting the problem of analyzing a POMDP into analyzing the behavior of a discrete-time switched system. For estimating the reachable belief space, we find over-approximations in terms of sub-level sets of Lyapunov functions. Furthermore, in order to verify safety and optimality requirements of a given POMDP, we formulate a barrier certificate theorem, wherein we show that if there exists a barrier certificate satisfying a set of inequalities along with the belief update equation of the POMDP, the safety and optimality properties are guaranteed to hold. In both cases, we show how the calculations can be decomposed into smaller problems that can be solved in parallel. The conditions we formulate can be computationally implemented as a set of sum-of-squares programs. We illustrate the applicability of our method by addressing two problems in active ad scheduling and machine teaching.

Belief State Actor-Critic Algorithm from Separation Principle for POMDP.

Belief State Separated Reinforcement Learning for Autonomous Vehicle Decision Making under Uncertainty.

A Shared Control Approach for Autonomous Vehicles via Driver Behaviors Learning

PODDP: Partially Observable Differential Dynamic Programming for Latent Belief Space Planning

BetaZero: Belief-State Planning for Long-Horizon POMDPs using Learned Approximations

Learning Online Belief Prediction for Efficient POMDP Planning in Autonomous Driving

Control Theory Meets POMDPs: A Hybrid Systems Approach

Learning Belief Representations for Imitation Learning in POMDPs

Flow-based Recurrent Belief State Learning for POMDPs

Optimality Guarantees for Particle Belief Approximation of POMDPs

A Fast Approximation Method for Partially Observable Markov Decision Processes.

No Compromise in Solution Quality: Speeding Up Belief-dependent Continuous POMDPs via Adaptive Multilevel Simplification

Provably Efficient UCB-type Algorithms For Learning Predictive State Representations

A Strategy-Oriented Bayesian Soft Actor-Critic Model

Solving driving policy for autonomous vehicles via AMDP-Q

The Wasserstein Believer: Learning Belief Updates for Partially Observable Environments through Reliable Latent Space Models

Decentralized Control of Partially Observable Markov Decision Processes using Belief Space Macro-actions

Probabilistic decision-making under uncertainty for autonomous driving using continuous POMDPs

BoT-Drive: Hierarchical Behavior and Trajectory Planning for Autonomous Driving using POMDPs

Situation-aware decision making for autonomous driving on urban road using online POMDP

PAC-Bayesian Soft Actor-Critic Learning