Abstract:The increasing trend to integrate neural networks and conventional software components in safety-critical settings calls for methodologies for their formal modelling, verification and correct-by-construction policy synthesis. We introduce neuro-symbolic partially observable Markov decision processes (NS-POMDPs), a variant of continuous-state POMDPs with discrete observations and actions, in which the agent perceives a continuous-state environment using a neural {\revise perception mechanism} and makes decisions symbolically. The perception mechanism classifies inputs such as images and sensor values into symbolic percepts, which are used in decision making. We study the problem of optimising discounted cumulative rewards for NS-POMDPs. Working directly with the continuous state space, we exploit the underlying structure of the model and the neural perception mechanism to propose a novel piecewise linear and convex representation (P-PWLC) in terms of polyhedra covering the state space and value vectors, and extend Bellman backups to this representation. We prove the convexity and continuity of value functions and present two value iteration algorithms that ensure finite representability. The first is a classical (exact) value iteration algorithm extending the $\alpha$-functions of Porta {\em et al} (2006) to the P-PWLC representation for continuous-state spaces. The second is a point-based (approximate) method called NS-HSVI, which uses the P-PWLC representation and belief-value induced functions to approximate value functions from below and above for two types of beliefs, particle-based and region-based. Using a prototype implementation, we show the practical applicability of our approach on two case studies that employ (trained) ReLU neural networks as perception functions, by synthesising (approximately) optimal strategies.

A Probabilistic Greedy Search Value Iteration Algorithm For Pomdp

A Probabilistic Forward Search Value Iteration Algorithm for POMDP

nso-HSVI: A Not-So-Optimistic Heuristic Search Value Iteration Algorithm for POMDPs

Popvi: A Probability-Based Optimal Policy Value Iteration Algorithm

A Probability-Based Value Iteration on Optimal Policy Algorithm for POMDP

A Neighborhood-Based Value Iteration Algorithm For Pomdp Problems

A Hybrid Heuristic Value Iteration Algorithm for Pomdp

A Multi-Criteria Value Iteration Algorithm For Pomdp Problems

A point-reduced POMDP value iteration algorithm with application to robot navigation

PLEASE: Palm Leaf Search for POMDPs with Large Observation Spaces.

Anytime Point-Based Approximations for Large POMDPs

Sound Heuristic Search Value Iteration for Undiscounted POMDPs with Reachability Objectives

Accelerating Point-Based Pomdp Algorithms Via Greedy Strategies

Point-Based POMDP Algorithms: Improved Analysis and Implementation

Point-Based Value Iteration for POMDPs with Neural Perception Mechanisms

Policy Graph Pruning And Optimization In Monte Carlo Value Iteration For Continuous-State Pomdps

Asynchronous value iteration for markov decision processes with continuous state spaces

Sparse tree search optimality guarantees in POMDPs with continuous observation spaces

Heuristic Search Value Iteration for One-Sided Partially Observable Stochastic Games

Voronoi Progressive Widening: Efficient Online Solvers for Continuous State, Action, and Observation POMDPs

Optimization Via Simulation Using Gaussian Process-Based Search