Abstract:Piecewise-deterministic Markov processes (PDMPs) are often used to model abrupt changes in the global environment or capabilities of a controlled system. This is typically done by considering a set of "operating modes" (each with its own system dynamics and performance metrics) and assuming that the mode can switch stochastically while the system state evolves. Such models have a broad range of applications in engineering, economics, manufacturing, robotics, and biological sciences. Here, we introduce and analyze an "occasionally observed" version of mode-switching PDMPs. We show how such systems can be controlled optimally if the planner is not alerted to mode-switches as they occur but may instead have access to infrequent mode observations. We first develop a general framework for handling this through dynamic programming on a higher-dimensional mode-belief space. While quite general, this method is rarely practical due to the curse of dimensionality. We then discuss assumptions that allow for solving the same problem much more efficiently, with the computational costs growing linearly (rather than exponentially) with the number of modes. We use this approach to derive Hamilton-Jacobi-Bellman PDEs and quasi-variational inequalities encoding the optimal behavior for a variety of planning horizons (fixed, infinite, indefinite, random) and mode-observation schemes (at fixed times or on-demand). We discuss the computational challenges associated with each version and illustrate the resulting methods on test problems from surveillance-evading path planning. We also include an example based on robotic navigation: a Mars rover that minimizes the expected time to target while accounting for the possibility of unobserved/incremental damages and dynamics-altering breakdowns.

What problem does this paper attempt to address?

### Problems the paper attempts to solve The paper aims to solve the problem of how to conduct optimal control in Piecewise - deterministic Markov Processes (PDMPs) with partially observed mode - switching. Specifically, the paper focuses on "Occasionally Observed Piecewise - Deterministic Markov Processes" (OOPDMPs), in which the controller cannot immediately know the occurrence of mode - switching and can only obtain the current mode information through infrequent observations. ### Main contributions 1. **Introduction of the OOPDMP framework**: - The paper introduces a new framework for dealing with situations where the controller cannot immediately obtain the switching information when the mode switches. - Such problems are dealt with in the high - dimensional mode - belief space through the dynamic programming method. 2. **Efficient algorithms**: - Some assumptions are proposed, which enable the problem to be solved more efficiently. The computational cost grows linearly with the number of modes instead of exponentially. - Hamilton - Jacobi - Bellman (HJB) partial differential equations and quasi - variational inequalities applicable to different planning time horizons (fixed, infinite, irregular, random) and mode - observation schemes (fixed - time or on - demand) are derived. 3. **Application examples**: - The practical applications of these methods are demonstrated through examples of surveillance - evasion path planning and Mars rover navigation. - For example, the Mars rover minimizes the expected time to reach the target in the case of unknown or gradually increasing damage and dynamic changes. ### Mathematical models - **Probability of mode - switching**: \[ P(\mu(s+\tau) = j \mid \mu(s) = i) = \lambda_{ij}(y(s), s) \tau + o(\tau) \] - **Mode - dependent dynamics**: \[ y'(s) = f_i(y(s), s, a) \] - **Value functions**: - For the finite - time - horizon problem: \[ u_i(x, t) = \inf_{a(\cdot) \in A} E \left[ \int_t^T K_{\mu(s)}(y(s), s, a(y(s), s, \mu(s))) ds + \psi_{\mu(T)}(y(T)) \right] \] - For the infinite - time - horizon problem: \[ J_i(x, a(\cdot)) = E \left[ \int_0^\infty e^{-\beta s} K_{\mu(s)}(y(s), a(y(s), \mu(s))) ds \right] \] - For the problem with irregular termination: \[ 0 = H_i(x, \nabla u_i(x)) + \sum_{j \neq i} \lambda_{ij}(x) (u_j(x) - u_i(x)), \quad x \in \Omega \setminus \Gamma \] \[ u_i(x) = \psi_i(x), \quad x \in \partial \Gamma \] ### Conclusion By introducing the OOPDMP framework, the paper solves the problem of optimal control in PDMPs with partially observed mode - switching and proposes efficient numerical methods. These methods have been verified in multiple practical applications, demonstrating their wide applicability in fields such as engineering, economics, manufacturing, robotics, and biosciences.

Occasionally Observed Piecewise-deterministic Markov Processes

MDPs with Unawareness

Recursively-Constrained Partially Observable Markov Decision Processes

Controlled Markov Processes With Safety State Constraints

Numerical method to solve impulse control problems for partially observed piecewise deterministic Markov processes

Control Theory Meets POMDPs: A Hybrid Systems Approach

Multiconstrained Finite-Horizon Piecewise Deterministic Markov Decision Processes with Unbounded Transition Rates

Piecewise-Deterministic Optimal Path Planning

Average Continuous Control of Piecewise Deterministic Markov Processes

Optimizing pre-scheduled, intermittently-observed MDPs

Switching Control in Multi-Mode Markov Decision Processes

PODDP: Partially Observable Differential Dynamic Programming for Latent Belief Space Planning

On Switched MPC of a Class of Switched Linear Systems with Modal Dwell Time

Observation for Markov Jump Piecewise-Affine Systems with Admissible Region-Switching Paths

Piecewise deterministic generative models

OCMDP: Observation-Constrained Markov Decision Process

Partially Observable Markov Decision Processes and Robotics

Partially Observable Markov Decision Processes (POMDPs) and Robotics

POMDPs in Continuous Time and Discrete Spaces

Distributionally Robust Partially Observable Markov Decision Process with Moment-based Ambiguity

Hybrid Planning for Dynamic Multimodal Stochastic Shortest Paths