A Simulation Optimization Algorithm for CTMDPs Based on Randomized Stationary Policies

tanghao,xihongsheng,yinbaoqun
2004-01-01
Abstract:Based on the theory of Markov performance potentials and neuro-dynamic programming (NDP) methodology, we study simulation optimization algorithm for a class of continuous time Markov decision processes (CTMDPs) under randomized stationary policies. The proposed algorithm will estimate the gradient of average cost performance measure with respect to policy parameters by transforming a continuous time Markov process into a uniform Markov chain and simulating a single sample path of the chain. The goal is to look for a suboptimal randomized stationary policy. The algorithm derived here can meet the needs of performance optimization of many difficult systems with large-scale state space. Finally, a numerical example for a controlled Markov process is provided.
What problem does this paper attempt to address?