Measurized Markov Decision Processes Part I: The Discounted Infinite Horizon Criterion

Daniel Adelman,Alba V. Olivares-Nadal
2024-11-06
Abstract:In this paper, we explore lifting Markov Decision Processes (MDPs) to the space of probability measures and consider the so-called measurized MDPs: deterministic processes where states are probability measures on the original state space, and actions are stochastic kernels on the original action space. Bertsekas and Shreve studied similar deterministic MDPs in the context of universally measurable policies. Here, we cast lifted MDPs within the semicontinuous-semicompact framework of Hernandez-Lerma and Lasserre. This makes the lifted framework more accessible as it entails (i) optimal Borel-measurable value functions and policies, (ii) reasonably mild assumptions that are easier to verify than those in the universally-measurable framework, and (iii) simpler proofs. In addition, we showcase the untapped potential of lifted MDPs by demonstrating how the measurized framework enables the incorporation of constraints and value function approximations that are not available from the standard MDP setting. Finally, we introduce a novel algebraic lifting procedure for any MDP, offering a systematic approach to derive measurized formulations. We use this method to show how non-deterministic measure-valued MDPs can emerge from lifting MDPs impacted by external random shocks. In this paper, we focus on the discounted infinite-horizon criterion, whereas in Part II we focus on the long-run average reward case.
Optimization and Control
What problem does this paper attempt to address?