Abstract:Piecewise-deterministic Markov processes (PDMPs) are often used to model abrupt changes in the global environment or capabilities of a controlled system. This is typically done by considering a set of "operating modes" (each with its own system dynamics and performance metrics) and assuming that the mode can switch stochastically while the system state evolves. Such models have a broad range of applications in engineering, economics, manufacturing, robotics, and biological sciences. Here, we introduce and analyze an "occasionally observed" version of mode-switching PDMPs. We show how such systems can be controlled optimally if the planner is not alerted to mode-switches as they occur but may instead have access to infrequent mode observations. We first develop a general framework for handling this through dynamic programming on a higher-dimensional mode-belief space. While quite general, this method is rarely practical due to the curse of dimensionality. We then discuss assumptions that allow for solving the same problem much more efficiently, with the computational costs growing linearly (rather than exponentially) with the number of modes. We use this approach to derive Hamilton-Jacobi-Bellman PDEs and quasi-variational inequalities encoding the optimal behavior for a variety of planning horizons (fixed, infinite, indefinite, random) and mode-observation schemes (at fixed times or on-demand). We discuss the computational challenges associated with each version and illustrate the resulting methods on test problems from surveillance-evading path planning. We also include an example based on robotic navigation: a Mars rover that minimizes the expected time to target while accounting for the possibility of unobserved/incremental damages and dynamics-altering breakdowns.

Multiconstrained Finite-Horizon Piecewise Deterministic Markov Decision Processes with Unbounded Transition Rates

Risk probability optimization of finite horizon piecewise deterministic Markov decision processes

On Linear Programming for Constrained and Unconstrained Average-Cost Markov Decision Processes with Countable Action Spaces and Strictly Unbounded Costs

Constrained Markov Decision Processes with Non-constant Discount Factor

Average Continuous Control of Piecewise Deterministic Markov Processes

Controlled Markov Processes With Safety State Constraints

Occasionally Observed Piecewise-deterministic Markov Processes

Asymptotically Optimal Policies for Weakly Coupled Markov Decision Processes

Continuous Time Markov Decision Processes with Nonuniformly Bounded Transition Rate: Expected Total Rewards

Constrained Risk-Averse Markov Decision Processes

Relaxed Equilibria for Time-Inconsistent Markov Decision Processes

Approximate Constrained Discounted Dynamic Programming with Uniform Feasibility and Optimality

A safe exploration approach to constrained Markov decision processes

Numerical method to solve impulse control problems for partially observed piecewise deterministic Markov processes

Recursively-Constrained Partially Observable Markov Decision Processes

Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space

Risk-Sensitive Average Markov Decision Processes in General Spaces

Anytime-Constrained Reinforcement Learning

Solution to the risk-sensitive average cost optimality equation in a class of Markov decision processes with finite state space

Approximation methods for piecewise deterministic Markov processes and their costs

Zero-Sum Games for piecewise deterministic Markov decision processes with risk-sensitive finite-horizon cost criterion