Two Time-Scale Gradient Approximation Algorithm For Adaptive Markov Reward Processes

Bingkun Bao,Hongsheng Xi,BaoQun Yin,Qiang Ling
2010-01-01
Abstract:In this paper, we study the stochastic optimization problem of adaptive Markov reward processes parameterized by two sets of parameters, including adjustable parameters, and unknown constant parameters. As the existing algorithms do not work well for this problem, we propose a novel two time-scale gradient approximation algorithm. This new algorithm yields fast convergence, small sample path variation and low computational cost. Under some mild assumptions, we theoretically prove the convergence of the proposed algorithm, and compare it with the existing algorithms through numerical examples, which confirms its superiority.
What problem does this paper attempt to address?