Abstract:The increasing trend to integrate neural networks and conventional software components in safety-critical settings calls for methodologies for their formal modelling, verification and correct-by-construction policy synthesis. We introduce neuro-symbolic partially observable Markov decision processes (NS-POMDPs), a variant of continuous-state POMDPs with discrete observations and actions, in which the agent perceives a continuous-state environment using a neural {\revise perception mechanism} and makes decisions symbolically. The perception mechanism classifies inputs such as images and sensor values into symbolic percepts, which are used in decision making. We study the problem of optimising discounted cumulative rewards for NS-POMDPs. Working directly with the continuous state space, we exploit the underlying structure of the model and the neural perception mechanism to propose a novel piecewise linear and convex representation (P-PWLC) in terms of polyhedra covering the state space and value vectors, and extend Bellman backups to this representation. We prove the convexity and continuity of value functions and present two value iteration algorithms that ensure finite representability. The first is a classical (exact) value iteration algorithm extending the $\alpha$-functions of Porta {\em et al} (2006) to the P-PWLC representation for continuous-state spaces. The second is a point-based (approximate) method called NS-HSVI, which uses the P-PWLC representation and belief-value induced functions to approximate value functions from below and above for two types of beliefs, particle-based and region-based. Using a prototype implementation, we show the practical applicability of our approach on two case studies that employ (trained) ReLU neural networks as perception functions, by synthesising (approximately) optimal strategies.

Policy Graph Pruning And Optimization In Monte Carlo Value Iteration For Continuous-State Pomdps

Observation-Based Optimization for POMDPs with Continuous State, Observation, and Action Spaces.

A Probability-Based Value Iteration on Optimal Policy Algorithm for POMDP

Sparse tree search optimality guarantees in POMDPs with continuous observation spaces

A Partially Observable Monte Carlo Planning Algorithm Based on Path Modification.

Policy Optimization with Model-based Explorations

Online algorithms for POMDPs with continuous state, action, and observation spaces

Monte Carlo Sampling Methods for Approximating Interactive POMDPs

Multi-Path Policy Optimization

Sound Heuristic Search Value Iteration for Undiscounted POMDPs with Reachability Objectives

A Surprisingly Simple Continuous-Action POMDP Solver: Lazy Cross-Entropy Search Over Policy Trees

Voronoi Progressive Widening: Efficient Online Solvers for Continuous State, Action, and Observation POMDPs

Improving Online POMDP Planning Algorithms with Decaying Q Value

Technical Report: The Policy Graph Improvement Algorithm

A Search Space Utility Optimization Based Online POMDP Planning Algorithm

Exploration in policy optimization through multiple paths

Recursively-Constrained Partially Observable Markov Decision Processes

Point-Based Value Iteration for POMDPs with Neural Perception Mechanisms

No Compromise in Solution Quality: Speeding Up Belief-dependent Continuous POMDPs via Adaptive Multilevel Simplification

Monte Carlo Information-Oriented Planning

Solving Truly Massive Budgeted Monotonic POMDPs with Oracle-Guided Meta-Reinforcement Learning