Extreme Occupation Measures in Markov Decision Processes with an Absorbing State
Alexey Piunovskiy,Yi Zhang
DOI: https://doi.org/10.1137/23m1572398
IF: 2.2
2024-01-14
SIAM Journal on Control and Optimization
Abstract:SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 65-90, February 2024. In this paper, we consider a Markov decision process (MDP) with a Borel state space [math], where [math] is an absorbing state (cemetery), and a Borel action space [math]. We consider the space of finite occupation measures restricted on [math] and the extreme points in it. It is possible that some strategies have infinite occupation measures. Nevertheless, we prove that every finite extreme occupation measure is generated by a deterministic stationary strategy. Then, for this MDP, we consider a constrained problem with total undiscounted criteria and [math] constraints, where the cost functions are nonnegative. By assumption, the strategies inducing infinite occupation measures are not optimal. Then our second main result is that, under mild conditions, the solution to this constrained MDP is given by a mixture of no more than [math] occupation measures generated by deterministic stationary strategies.
mathematics, applied,automation & control systems