Abstract:Partially Observable Monte-Carlo Planning (POMCP) is a powerful online algorithm able to generate approximate policies for large Partially Observable Markov Decision Processes. The online nature of this method supports scalability by avoiding complete policy representation. The lack of an explicit representation however hinders interpretability. In this work, we propose a methodology based on Satisfiability Modulo Theory (SMT) for analyzing POMCP policies by inspecting their traces, namely sequences of belief-action-observation triplets generated by the algorithm. The proposed method explores local properties of policy behavior to identify unexpected decisions. We propose an iterative process of trace analysis consisting of three main steps, i) the definition of a question by means of a parametric logical formula describing (probabilistic) relationships between beliefs and actions, ii) the generation of an answer by computing the parameters of the logical formula that maximize the number of satisfied clauses (solving a MAX-SMT problem), iii) the analysis of the generated logical formula and the related decision boundaries for identifying unexpected decisions made by POMCP with respect to the original question. We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation. Results show that the approach can exploit human knowledge on the domain, outperforming state-of-the-art anomaly detection methods in identifying unexpected decisions. An improvement of the Area Under Curve up to 47\% has been achieved in our tests.

Multi-Agent Planning under Uncertainty with Monte Carlo Q-Value Function

A Partially Observable Monte Carlo Planning Algorithm Based on Path Modification.

A Search Space Utility Optimization Based Online POMDP Planning Algorithm

Multilevel Monte-Carlo for Solving POMDPs Online

Thompson Sampling Based Monte-Carlo Planning in POMDPs.

Bayesian Optimized Monte Carlo Planning

Policy search for multi-robot coordination under uncertainty

Monte Carlo Information-Oriented Planning

Monte-Carlo Robot Path Planning

Planning for Decentralized Control of Multiple Robots Under Uncertainty

Policy Graph Pruning And Optimization In Monte Carlo Value Iteration For Continuous-State Pomdps

Monte-Carlo Search for an Equilibrium in Dec-POMDPs

Constrained Hierarchical Monte Carlo Belief-State Planning

Multiagent Gumbel MuZero: Efficient Planning in Combinatorial Action Spaces

Optimally Solving Simultaneous-Move Dec-POMDPs: The Sequential Central Planning Approach

Hybrid Heuristic Online Planning for POMDPs

Monte Carlo Sampling Methods for Approximating Interactive POMDPs

Improving Online POMDP Planning Algorithms with Decaying Q Value

Decision Making in Non-Stationary Environments with Policy-Augmented Search

Identification of Unexpected Decisions in Partially Observable Monte-Carlo Planning: a Rule-Based Approach

Efficient Multi-agent Reinforcement Learning by Planning