POMDPs in Continuous Time and Discrete Spaces

Bastian Alt,Matthias Schultheis,Heinz Koeppl

DOI: https://doi.org/10.48550/arXiv.2010.01014

2020-10-26

Abstract:Many processes, such as discrete event systems in engineering or population dynamics in biology, evolve in discrete space and continuous time. We consider the problem of optimal decision making in such discrete state and action space systems under partial observability. This places our work at the intersection of optimal filtering and optimal control. At the current state of research, a mathematical description for simultaneous decision making and filtering in continuous time with finite state and action spaces is still missing. In this paper, we give a mathematical description of a continuous-time partial observable Markov decision process (POMDP). By leveraging optimal filtering theory we derive a Hamilton-Jacobi-Bellman (HJB) type equation that characterizes the optimal solution. Using techniques from deep learning we approximately solve the resulting partial integro-differential equation. We present (i) an approach solving the decision problem offline by learning an approximation of the value function and (ii) an online algorithm which provides a solution in belief space using deep reinforcement learning. We show the applicability on a set of toy examples which pave the way for future methods providing solutions for high dimensional problems.

Machine Learning,Systems and Control,Optimization and Control

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the optimal decision - making problem in continuous time and discrete state space under partially observable conditions. Specifically, the author focuses on how to perform optimal filtering (i.e., estimating the hidden state of the system) and optimal control (i.e., making decisions based on the estimated state) simultaneously in such a system. This type of problem has wide applications in fields such as discrete - event systems in engineering or population dynamics in biology. The main contribution of the paper lies in proposing a mathematical description of the continuous - time partially observable Markov decision process (POMDP), and deriving an expression similar to the Hamilton - Jacobi - Bellman (HJB) equation by using the optimal filtering theory, which can be used to characterize the optimal solution. In addition, the paper also proposes a method of using deep - learning techniques to approximately solve the resulting partial integro - differential equation. Specific methods include: 1. **Offline method**: Solve the decision - making problem by learning the approximate value of the value function. 2. **Online method**: Use deep reinforcement learning to provide solutions in the belief space. These methods have been verified in a series of toy examples, demonstrating their potential in dealing with high - dimensional problems.

POMDPs in Continuous Time and Discrete Spaces

Approximate Control for Continuous-Time POMDPs

Observation-Based Optimization for POMDPs with Continuous State, Observation, and Action Spaces.

Online algorithms for POMDPs with continuous state, action, and observation spaces

Control Theory Meets POMDPs: A Hybrid Systems Approach

Approximation Schemes for POMPDs with Continuous Spaces and Their Near Optimality

Analytical Solution to A Discrete-Time Model for Dynamic Learning and Decision-Making

Probabilistic decision-making under uncertainty for autonomous driving using continuous POMDPs

Partially Observable Markov Decision Processes (POMDPs) and Robotics

Optimality Guarantees for Particle Belief Approximation of POMDPs

Which states matter? An application of an intelligent discretization method to solve a continuous POMDP in conservation biology

Monte Carlo Sampling Methods for Approximating Interactive POMDPs

Numerical method to solve impulse control problems for partially observed piecewise deterministic Markov processes

Sparse tree search optimality guarantees in POMDPs with continuous observation spaces

PODDP: Partially Observable Differential Dynamic Programming for Latent Belief Space Planning

Partially Observable Markov Decision Processes and Robotics

Partially Observable Markov Decision Processes in Robotics: A Survey

Decentralized Control of Partially Observable Markov Decision Processes using Belief Space Macro-actions

Dual Sequential Monte Carlo: Tunneling Filtering and Planning in Continuous POMDPs

Recursively-Constrained Partially Observable Markov Decision Processes

Partially Observed Optimal Stochastic Control: Regularity, Optimality, Approximations, and Learning