Abstract:Many current large-scale multiagent team implementations can be characterized as following the belief-desire-intention (BDI) paradigm, with explicit representation of team plans. Despite their promise, current BDI team approaches lack tools for quantitative performance analysis under uncertainty. Distributed partially observable Markov decision problems (POMDPs) are well suited for such analysis, but the complexity of finding optimal policies in such models is highly intractable. The key contribution of this article is a hybrid BDI-POMDP approach, where BDI team plans are exploited to improve POMDP tractability and POMDP analysis improves BDI team plan performance. Concretely, we focus on role allocation, a fundamental problem in BDI teams: which agents to allocate to the different roles in the team. The article provides three key contributions. First, we describe a role allocation technique that takes into account future uncertainties in the domain; prior work in multiagent role allocation has failed to address such uncertainties. To that end, we introduce RMTDP (Role-based Markov Team Decision Problem), a new distributed POMDP model for analysis of role allocations. Our technique gains in tractability by significantly curtailing RMTDP policy search; in particular, BDI team plans provide incomplete RMTDP policies, and the RMTDP policy search fills the gaps in such incomplete policies by searching for the best role allocation. Our second key contribution is a novel decomposition technique to further improve RMTDP policy search efficiency. Even though limited to searching role allocations, there are still combinatorially many role allocations, and evaluating each in RMTDP to identify the best is extremely difficult. Our decomposition technique exploits the structure in the BDI team plans to significantly prune the search space of role allocations. Our third key contribution is a significantly faster policy evaluation algorithm suited for our BDI-POMDP hybrid approach. Finally, we also present experimental results from two domains: mission rehearsal simulation and RoboCupRescue disaster rescue simulation.

Approximate Dec-POMDP Solving Using Multi-Agent A*

Mixed Integer Linear Programming For Exact Finite-Horizon Planning In Decentralized Pomdps

Approximate Linear Programming for Decentralized Policy Iteration in Cooperative Multi-agent Markov Decision Processes

Scalable Anytime Planning for Multi-Agent MDPs

Multi-agent Policy Optimization with Approximatively Synchronous Advantage Estimation

Monte Carlo Sampling Methods for Approximating Interactive POMDPs

Order Matters: Agent-by-agent Policy Optimization.

Approximate Policy Iteration for Robust Stochastic Control of Multi-agent Markov Decision Processes

Scalable Planning and Learning for Multiagent POMDPs: Extended Version

B2MAPO: A Batch-by-Batch Multi-Agent Policy Optimization to Balance Performance and Efficiency

Multi-Agent Planning under Uncertainty with Monte Carlo Q-Value Function

Subdimensional Expansion for Multi-Objective Multi-Agent Path Finding

Factored Online Planning in Many-Agent POMDPs

Hybrid BDI-POMDP Framework for Multiagent Teaming

Cooperative Multi-Agent Constrained POMDPs: Strong Duality and Primal-Dual Reinforcement Learning with Approximate Information States

Multilevel Monte-Carlo for Solving POMDPs Online

Hybrid Heuristic Online Planning for POMDPs

A Framework for Sequential Planning in Multi-Agent Settings

Coordinated Proximal Policy Optimization

Improving Online POMDP Planning Algorithms with Decaying Q Value

PEGASUS: A policy search method for large MDPs and POMDPs