Abstract:Many current large-scale multiagent team implementations can be characterized as following the belief-desire-intention (BDI) paradigm, with explicit representation of team plans. Despite their promise, current BDI team approaches lack tools for quantitative performance analysis under uncertainty. Distributed partially observable Markov decision problems (POMDPs) are well suited for such analysis, but the complexity of finding optimal policies in such models is highly intractable. The key contribution of this article is a hybrid BDI-POMDP approach, where BDI team plans are exploited to improve POMDP tractability and POMDP analysis improves BDI team plan performance. Concretely, we focus on role allocation, a fundamental problem in BDI teams: which agents to allocate to the different roles in the team. The article provides three key contributions. First, we describe a role allocation technique that takes into account future uncertainties in the domain; prior work in multiagent role allocation has failed to address such uncertainties. To that end, we introduce RMTDP (Role-based Markov Team Decision Problem), a new distributed POMDP model for analysis of role allocations. Our technique gains in tractability by significantly curtailing RMTDP policy search; in particular, BDI team plans provide incomplete RMTDP policies, and the RMTDP policy search fills the gaps in such incomplete policies by searching for the best role allocation. Our second key contribution is a novel decomposition technique to further improve RMTDP policy search efficiency. Even though limited to searching role allocations, there are still combinatorially many role allocations, and evaluating each in RMTDP to identify the best is extremely difficult. Our decomposition technique exploits the structure in the BDI team plans to significantly prune the search space of role allocations. Our third key contribution is a significantly faster policy evaluation algorithm suited for our BDI-POMDP hybrid approach. Finally, we also present experimental results from two domains: mission rehearsal simulation and RoboCupRescue disaster rescue simulation.

Solving Truly Massive Budgeted Monotonic POMDPs with Oracle-Guided Meta-Reinforcement Learning

Welfare Maximization Algorithm for Solving Budget-Constrained Multi-Component POMDPs

Capacity-Aware Planning and Scheduling in Budget-Constrained Monotonic MDPs: A Meta-RL Approach

Multilevel Monte-Carlo for Solving POMDPs Online

Maintenance Strategy Optimization Using a Continuous-State Partially Observable Semi-Markov Decision Process

Sound Heuristic Search Value Iteration for Undiscounted POMDPs with Reachability Objectives

Deep Reinforce Learning for Joint Optimization of Condition-Based Maintenance and Spare Ordering.

Bridging the Gap between Partially Observable Stochastic Games and Sparse POMDP Methods

Deep Reinforcement Learning for Dynamic Opportunistic Maintenance of Multi-Component Systems With Load Sharing

Hindsight is Only 50/50: Unsuitability of MDP based Approximate POMDP Solvers for Multi-resolution Information Gathering

Policy Graph Pruning And Optimization In Monte Carlo Value Iteration For Continuous-State Pomdps

Improving POMDP Tractability Via Belief Compression and Clustering

Online algorithms for POMDPs with continuous state, action, and observation spaces

Recursively-Constrained Partially Observable Markov Decision Processes

A Pomdp Based Decentralized Maintenance for Multi-State System with Heterogeneous Components

Point-Based Methods for Model Checking in Partially Observable Markov Decision Processes

Hybrid BDI-POMDP Framework for Multiagent Teaming

Integrating Value-Directed Compression and Belief Space Analysis for POMDP Decomposition

Decomposing Large-Scale POMDP Via Belief State Analysis.

Monte Carlo Sampling Methods for Approximating Interactive POMDPs

Anytime Point-Based Approximations for Large POMDPs