Simulation Optimization Algorithm for SMDPs with Parameterized Randomized Stationary Policies

DAI Gui-ping,TANG Hao,XI Hong-sheng
DOI: https://doi.org/10.3969/j.issn.1000-8152.2006.04.010
2006-01-01
Abstract:Based on the theory of performance potentials and the method of equivalent Markov process, the performance optimization problem is discussed for a class of semi-Markov decision processes (SMDPs) with parameterized randomized stationary policies and a simulation optimization algorithm is proposed. Firstly, a uniform Markov chain is defined through the equivalent Markov process. Secondly, the gradient of the average cost performance with respect to the policy parameters is then estimated by simulating a single sample path of the uniformized Markov chain, so that an optimal (or suboptimal) randomized stationary policy can be found by iterating the parameters. The derived algorithm can meet the requirements of performance optimization of many different systems with large-scale state space, an artificial neural network is also used to approximate the parameterized randomized stationary policies and avoid the curse of dimensionality. Finally, convergence of the algorithm with probability one on an infinite sample path is considered, and a numerical example is provided to illustrate the application of the algorithm.
What problem does this paper attempt to address?