Abstract:Partially Observable Markov Decision Processes (POMDPs) are a powerful framework for planning under uncertainty. They allow to model state uncertainty as a belief probability distribution. Approximate solvers based on Monte Carlo sampling show great success to relax the computational demand and perform online planning. However, scaling to complex realistic domains with many actions and long planning horizons is still a major challenge, and a key point to achieve good performance is guiding the action-selection process with domain-dependent policy heuristics which are tailored for the specific application domain. We propose to learn high-quality heuristics from POMDP traces of executions generated by any solver. We convert the belief-action pairs to a logical semantics, and exploit data- and time-efficient Inductive Logic Programming (ILP) to generate interpretable belief-based policy specifications, which are then used as online heuristics. We evaluate thoroughly our methodology on two notoriously challenging POMDP problems, involving large action spaces and long planning horizons, namely, rocksample and pocman. Considering different state-of-the-art online POMDP solvers, including POMCP, DESPOT and AdaOPS, we show that learned heuristics expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specific heuristics within lower computational time. Moreover, they well generalize to more challenging scenarios not experienced in the training phase (e.g., increasing rocks and grid size in rocksample, incrementing the size of the map and the aggressivity of ghosts in pocman).

Learning Explainable and Better Performing Representations of POMDP Strategies

Explainable Finite-Memory Policies for Partially Observable Markov Decision Processes

Counterexample-Guided Strategy Improvement for POMDPs Using Recurrent Neural Networks

Planning in POMDPs Using Multiplicity Automata

End-to-End Policy Gradient Method for POMDPs and Explainable Agents

Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP

Generalizing Multi-Step Inverse Models for Representation Learning to Finite-Memory POMDPs

Strong Simple Policies for POMDPs

Learning in Observable POMDPs, without Computationally Intractable Oracles

Leveraging Knowledge Graph-Based Human-Like Memory Systems to Solve Partially Observable Markov Decision Processes

Learning Logic Specifications for Policy Guidance in POMDPs: an Inductive Logic Programming Approach

Provable Representation with Efficient Planning for Partial Observable Reinforcement Learning

Strategy Synthesis in POMDPs via Game-Based Abstractions

Learning for Decentralized Control of Multiagent Systems in Large, Partially-Observable Stochastic Environments

SOS: Safe, Optimal and Small Strategies for Hybrid Markov Decision Processes

Leveraging Counterfactual Paths for Contrastive Explanations of POMDP Policies

Reinforcement Learning in Partially Observable Markov Decision Processes using Hybrid Probabilistic Logic Programs

Model-based motion planning in POMDPs with temporal logic specifications

Learning without state-estimation in partially observable Markovian decision processes

Embed to Control Partially Observed Systems: Representation Learning with Provable Sample Efficiency

Identification of Unexpected Decisions in Partially Observable Monte-Carlo Planning: a Rule-Based Approach