Policy iteration for parameterized Markov decision processes and its application

Li Xia,Qingshan Jia
DOI: https://doi.org/10.1109/ASCC.2013.6606023
2013-01-01
Abstract:In a parameterized Markov decision process (MDP), the decision maker has to choose the optimal parameters which induce the maximal average system reward. However, the traditional policy iteration algorithm is usually inapplicable because the parameters choosing is not independent of the system state. In this paper, we use the direct comparison approach to study this problem. A general difference equation is derived to compare the performance difference under different parameters. We derive a theoretical condition that can guarantee the application of policy iteration to the parameterized MDP. This policy iteration type algorithm is much more efficient than the gradient optimization algorithm for parameterized MDP. Finally, we study the service rate control problem of closed Jackson networks as an example to demonstrate the main idea of this paper.
What problem does this paper attempt to address?