A Basic Formula for Performance Gradient Estimation of Semi-Markov Decision Processes

Yanjie Li,Fang Cao
DOI: https://doi.org/10.1016/j.ejor.2012.08.010
IF: 6.4
2013-01-01
European Journal of Operational Research
Abstract:This paper presents a basic formula for performance gradient estimation of semi-Markov decision processes (SMDPs) under average-reward criterion. This formula directly follows from a sensitivity equation in perturbation analysis. With this formula, we develop three sample-path-based gradient estimation algorithms by using a single sample path. These algorithms naturally extend many gradient estimation algorithms for discrete-time Markov systems to continuous time semi-Markov models. In particular, they require less storage than the algorithm in the literature. (C) 2012 Elsevier B.V. All rights reserved.
What problem does this paper attempt to address?