A Potential-Based Method for Finite-Stage Markov Decision Process

Qing-Shan Jia
DOI: https://doi.org/10.1109/acc.2008.4587291
2008-01-01
Abstract:Finite-stage Markov decision process (MDP) supplies a general framework for many practical problems when only the performance in a finite duration is of interest. Dynamic programming (DP) supplies a general way to find the optimal policies but is usually practically infeasible, due to the exponentially increasing policy space. Approximating the finite-stage MDP by an infinite-stage MDP reduces the search space but usually does not find the optimal stationary policy, due to the approximation error. We develop a method that finds the optimal stationary policies for the finite-stage MDP. The method is based on performance potentials, which can be estimated through sample paths and thus suits practical application.
What problem does this paper attempt to address?