Abstract:We consider Markov decision processes where the state of the chain is only given at chosen observation times and of a cost. Optimal strategies involve the optimisation of observation times as well as the subsequent action values. We consider the finite horizon and discounted infinite horizon problems, as well as an extension with parameter uncertainty. By including the time elapsed from observations as part of the augmented Markov system, the value function satisfies a system of quasi-variational inequalities (QVIs). Such a class of QVIs can be seen as an extension to the interconnected obstacle problem. We prove a comparison principle for this class of QVIs, which implies uniqueness of solutions to our proposed problem. Penalty methods are then utilised to obtain arbitrarily accurate solutions. Finally, we perform numerical experiments on three applications which illustrate our framework.

What problem does this paper attempt to address?

The paper primarily focuses on solving optimization problems in Markov Decision Processes (MDPs) with observation costs. Specifically, it considers a scenario where the state of the MDP can only be obtained at selected observation times, and each observation incurs a certain cost. Therefore, the optimization strategy needs to consider not only the selection of observation times but also the subsequent action values. The key contributions of the paper can be summarized as follows: 1. **Construction of the Observation Cost Model (OCM)**: - Introduced the concept of observation cost on the basis of standard MDPs, meaning that observing the system state requires a certain cost. - OCM assumes that actions remain unchanged between two observations. - The paper formulates OCM as a Partially Observable Markov Decision Process (POMDP), where the passage of time is considered as part of the extended Markov system. 2. **Mathematical Modeling of the Optimization Problem**: - Defined finite-horizon and infinite-horizon discounted problems. - For each problem, derived optimality equations through dynamic programming, which are expressed in the form of Quasi-Variational Inequalities (QVIs). - Established a comparison principle, proving the existence and uniqueness of the solutions to the proposed QVIs. 3. **Numerical Methods and Experimental Validation**: - Proposed a penalty scheme to efficiently solve the aforementioned QVIs. - Validated the effectiveness of the proposed framework through numerical experiments, particularly demonstrating its application potential in three case studies. In summary, the main objective of this paper is to provide a theoretical framework for MDPs with observation costs and to propose effective numerical methods to solve such problems. This has broad application prospects in fields such as maintenance, portfolio optimization, sensor detection, and reinforcement learning.

Markov decision processes with observation costs: framework and computation with a penalty scheme

Markov Decision Processes with Observation Costs: Framework and Computation with a Penalty Scheme

Markov Decision Processes with Time-Varying Geometric Discounting

Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space

Markov Decision Processes with Incomplete Information and Semi-Uniform Feller Transition Probabilities

Markov decision processes with risk-sensitive criteria: an overview

Online Markov decision processes with Kullback-Leibler control cost

Risk-Sensitive Average Markov Decision Processes in General Spaces

Combinatorial Selection with Costly Information

Constrained Markov Decision Processes with Non-constant Discount Factor

Numerical method to solve impulse control problems for partially observed piecewise deterministic Markov processes

Continuous Time Markov Decision Processes with Expected Discounted Total Rewards

Online Resource Allocation in Episodic Markov Decision Processes

Markov Decision Process Design: A Framework for Integrating Strategic and Operational Decisions

Solution to the risk-sensitive average cost optimality equation in a class of Markov decision processes with finite state space

Mean Field Markov Decision Processes

Approximation methods for piecewise deterministic Markov processes and their costs

A safe exploration approach to constrained Markov decision processes

Risk-sensitive discounted Markov decision processes with unbounded reward functions and Borel spaces

Mixed Markov Decision Processes in a Semi-Markov Environment with Discounted Criterion