Abstract:Partially observable Markov decision processes (POMDPs) provide a modeling framework for a variety of sequential decision making under uncertainty scenarios in artificial intelligence (AI). Since the states are not directly observable in a POMDP, decision making has to be performed based on the output of a Bayesian filter (continuous beliefs). Hence, POMDPs are often computationally intractable to solve exactly and researchers resort to approximate methods often using discretizations of the continuous belief space. These approximate solutions are, however, prone to discretization errors, which has made POMDPs ineffective in applications, wherein guarantees for safety, optimality, or performance are required. To overcome the complexity challenge of POMDPs, we apply notions from control theory. The goal is to determine the reachable belief space of a POMDP, that is, the set of all possible evolutions given an initial belief distribution over the states and a set of actions and observations. We begin by casting the problem of analyzing a POMDP into analyzing the behavior of a discrete-time switched system. For estimating the reachable belief space, we find over-approximations in terms of sub-level sets of Lyapunov functions. Furthermore, in order to verify safety and optimality requirements of a given POMDP, we formulate a barrier certificate theorem, wherein we show that if there exists a barrier certificate satisfying a set of inequalities along with the belief update equation of the POMDP, the safety and optimality properties are guaranteed to hold. In both cases, we show how the calculations can be decomposed into smaller problems that can be solved in parallel. The conditions we formulate can be computationally implemented as a set of sum-of-squares programs. We illustrate the applicability of our method by addressing two problems in active ad scheduling and machine teaching.

A POMDP Approach to Token-Based Team Coordination

Expert demonstrations guide reward decomposition for multi-agent cooperation

Moving Forward in Formation: A Decentralized Hierarchical Learning Approach to Multi-Agent Moving Together

Multi-agent Coordination Under Temporal Logic Tasks and Team-Wise Intermittent Communication

An Auction-based Coordination Strategy for Task-Constrained Multi-Agent Stochastic Planning with Submodular Rewards

Decentralized Task and Path Planning for Multi-Robot Systems

Payoff Mechanism Design for Coordination in Multi-Agent Task Allocation Games

Combinatorial-hybrid Optimization for Multi-agent Systems under Collaborative Tasks

Planning for Decentralized Control of Multiple Robots Under Uncertainty

Bridging the Gap between Partially Observable Stochastic Games and Sparse POMDP Methods

Enhancing Multi-Agent Coordination through Common Operating Picture Integration

Cooperative Multi-Agent Constrained POMDPs: Strong Duality and Primal-Dual Reinforcement Learning with Approximate Information States

Multiagent Q-learning with Sub-Team Coordination.

Asynchronous Credit Assignment Framework for Multi-Agent Reinforcement Learning

Robust optimal policies for team Markov games

On the coordination efficiency of strategic multi-agent robotic teams

United We Stand: Decentralized Multi-Agent Planning With Attrition

Control Theory Meets POMDPs: A Hybrid Systems Approach

The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models

Approximate Linear Programming for Decentralized Policy Iteration in Cooperative Multi-agent Markov Decision Processes

OPTIMA: Optimized Policy for Intelligent Multi-Agent Systems Enables Coordination-Aware Autonomous Vehicles