Infinite-horizon gradient estimation for semi-Markov decision processes

Yanjie Li,Fang Cao
2011-01-01
Abstract:This paper presents a performance gradient formula for semi-Markov decision processes with average reward criterion. With this formula, we propose an infinite-horizon online (sample-path based) gradient estimation algorithm. This algorithm naturally extend online gradient estimation algorithm for discrete-time Markov systems to continuous time semi-Markov models. In particular, the new algorithm requires less storage than the algorithm appeared in the literature.
What problem does this paper attempt to address?