Abstract:Markov Decision Processes (MDP) is an useful framework to cast optimal sequential decision making problems. Given any MDP the aim is to find the optimal action selection mechanism i.e., the optimal policy. Typically, the optimal policy ($u^*$) is obtained by substituting the optimal value-function ($J^*$) in the Bellman equation. Alternately $u^*$ is also obtained by learning the optimal state-action value function $Q^*$ known as the $Q$ value-function. However, it is difficult to compute the exact values of $J^*$ or $Q^*$ for MDPs with large number of states. Approximate Dynamic Programming (ADP) methods address this difficulty by computing lower dimensional approximations of $J^*$/$Q^*$. Most ADP methods employ linear function approximation (LFA), i.e., the approximate solution lies in a subspace spanned by a family of pre-selected basis functions. The approximation is obtain via a linear least squares projection of higher dimensional quantities and the $L_2$ norm plays an important role in convergence and error analysis. In this paper, we discuss ADP methods for MDPs based on LFAs in $(\min,+)$ algebra. Here the approximate solution is a $(\min,+)$ linear combination of a set of basis functions whose span constitutes a subsemimodule. Approximation is obtained via a projection operator onto the subsemimodule which is different from linear least squares projection used in ADP methods based on conventional LFAs. MDPs are not $(\min,+)$ linear systems, nevertheless, we show that the monotonicity property of the projection operator helps us to establish the convergence of our ADP schemes. We also discuss future directions in ADP methods for MDPs based on the $(\min,+)$ LFAs.

Approximate Constrained Discounted Dynamic Programming with Uniform Feasibility and Optimality

Convergence Rate of Primal-Dual Approach to Constrained Reinforcement Learning with Softmax Policy

Approximate dynamic programming for continuous state and control problems

Policy Approximation in Policy Iteration Approximate Dynamic Programming for Discrete-Time Nonlinear Systems.

A safe exploration approach to constrained Markov decision processes

Discrete‐Time Optimal Control of State‐Constrained Nonlinear Systems Using Approximate Dynamic Programming

Approximate Dynamic Programming with Feasibility Guarantees

Efficient approximate dynamic programming based on design and analysis of computer experiments for infinite-horizon optimization

Approximate Dynamic Programming for Constrained Piecewise Affine Systems with Stability and Safety Guarantees

Solving Mission-Wide Chance-Constrained Optimal Control Using Dynamic Programming

A Convex Optimization Approach to Dynamic Programming in Continuous State and Action Spaces

Inexact cuts for Deterministic and Stochastic Dual Dynamic Programming applied to convex nonlinear optimization problems

Fast Approximate Dynamic Programming for Input-Affine Dynamics

Varying Receding-Horizon Based Production Control for Hybrid Production Systems

Exact Dynamic Programming for Positive Systems with Linear Optimal Cost

Approximate dynamic programming with $(\min,+)$ linear function approximation for Markov decision processes

Natural Policy Gradient Primal-Dual Method for Constrained Markov Decision Processes

On solving optimal policies for event-based dynamic programming

Gradient-Bounded Dynamic Programming with Submodular and Concave Extensible Value Functions

Revisiting approximate dynamic programming and its convergence

Error bound analysis of policy iteration based approximate dynamic programming for deterministic discrete-time nonlinear systems