Two-Timescale Simulation-based Algorithm for Markov Decision Process Based on Performance Potentials

鲍秉坤,殷保群,奚宏生
2009-01-01
Abstract:A novel two time-scale simulation-based gradient algorithm based on performance potential for discrete time Markov decision process was proposed, by introducing the concept of two time-scale into the performance potential based stochastic approximation. This algorithm tackles the limitations in classical approaches that the every-update simulation- based gradient algorithm updates too frequently, and the regenerative-update gradient algorithm updates too infrequently. Three numerical examples illustrate the superiority of two time-scale simulation-based gradient algorithm in computational complexity, convergence speed and convergence precision.
What problem does this paper attempt to address?