Abstract:In this paper, we explore lifting Markov Decision Processes (MDPs) to the space of probability measures and consider the so-called measurized MDPs: deterministic processes where states are probability measures on the original state space, and actions are stochastic kernels on the original action space. Bertsekas and Shreve studied similar deterministic MDPs in the context of universally measurable policies. Here, we cast lifted MDPs within the semicontinuous-semicompact framework of Hernandez-Lerma and Lasserre. This makes the lifted framework more accessible as it entails (i) optimal Borel-measurable value functions and policies, (ii) reasonably mild assumptions that are easier to verify than those in the universally-measurable framework, and (iii) simpler proofs. In addition, we showcase the untapped potential of lifted MDPs by demonstrating how the measurized framework enables the incorporation of constraints and value function approximations that are not available from the standard MDP setting. Finally, we introduce a novel algebraic lifting procedure for any MDP, offering a systematic approach to derive measurized formulations. We use this method to show how non-deterministic measure-valued MDPs can emerge from lifting MDPs impacted by external random shocks. In this paper, we focus on the discounted infinite-horizon criterion, whereas in Part II we focus on the long-run average reward case.

Markov Decision Processes with Time-Varying Geometric Discounting

Relaxed Equilibria for Time-Inconsistent Markov Decision Processes

Optimal Time-Abstract Schedulers for CTMDPs and Markov Games

Markov Decision Problems with Unbounded Transition Rates under Discounted-Cost Performance Criteria

Continuous Time Markov Decision Processes with Expected Discounted Total Rewards

Markov Decision Processes under Risk Sensitivity: A Discount Vanishing Approach

Measurized Markov Decision Processes Part I: The Discounted Infinite Horizon Criterion

Discounting the Past

Mixed Markov Decision Processes in a Semi-Markov Environment with Discounted Criterion

A survey of recent results on continuous-time Markov decision processes

The Finiteness of the Reward Function and the Optimal Value Function in Markov Decision Processes

Constrained Markov Decision Processes with Non-constant Discount Factor

Markov decision processes with observation costs: framework and computation with a penalty scheme

Performance Optimization of Semi-Markov Decision Processes with Discounted-cost Criteria.

Beyond discounted returns: Robust Markov decision processes with average and Blackwell optimality

Markov Decision Processes under External Temporal Processes

Analytical Solution to A Discrete-Time Model for Dynamic Learning and Decision-Making

MDP Geometry, Normalization and Reward Balancing Solvers

Stochastic Processes with Expected Stopping Time

Stochastic Dynamic Programming with Non-linear Discounting